Maths for Biology July 6 th 2015 Christian Bokhove Carys Hughes Hilary Otter Rebecca D’Silva Nicky Miller
Why this day Changes in the A level Biology curriculum More maths Maths used interdisciplinary Principles –Instruction but also hands-on tasks –Collaborative, doing it together, ask questions –Want to customise the course to where you need support
Web page with resources
Introductions Christian Bokhove –Lecturer in Mathematics Education Carys Hughes Hilary Otter Rebecca D’Silva Nicky Miller
Objectives of the day Update and strengthen knowledge on some topics for the new Biology A level curriculum; –Exponential growth and decay, logarithms –Statistical tests Hear and discuss ideas for teaching them; Want to hear your opinions for improvements; You leave with: –Ideas, knowledge and some resources
Schedule for the day WhenWhat 9:00 – 9:30Welcome, introductions 9:30 – 10:15Exponential growth and decay 10:15 – 12:30Logarithms and log paper 12:30 – 13:30Lunch break 13:30 – 14:00Statistical tests 14:00 –Mini workshops statistics
MATHS CONFIDENCE SURVEY
EXPONENTIAL GROWTH AND DECAY, LOGARITHMS
Slide exam curriculum From (i) A.0 - arithmetic and numerical computation A.0.5 Use calculators to find and use power, exponential and logarithmic functions Candidates may be tested on their ability to: estimate the number of bacteria grown over a certain length of time From (ii) A.2 – algebra A.2.5 Use logarithms in relation to quantities that range over several orders of magnitude Candidates may be tested on their ability to: use a logarithmic scale in the context of microbiology, e.g. growth rate of a microorganism such as yeast
Powers of 10 There is a newer version called Cosmic Voyage, narrated by Morgan Freeman. It, however, does not have the standard notation included.
Standard form and powers
ACTIVITY LinkLink to task H
VERY LARGE AND VERY SMALL Negative and positive standard form
VERY LARGE AND VERY SMALL
EXPONENTIAL GROWTH AND DECAY (Before we can look at logarithms we need to deal with exponential growth and decay)
19 ACTIVITY Exponentials Take an A4 sheet of paper. How many times can you fold it in half?
MYTHBUSTERS
21 How many layers do you produce? HANDOUT Number of folds (x) Number of layers (y) MultPowerHeig ht (cm) *2 32*2*2 42*2*2* Table of results H
22 Plot your values on graph paper:
23 Exponential growth Imagine you contracted a virus (such as SARS) where you infected the first five people that you met, and they each infected the first five people that they met and so on…. There are 186,701 people living in Southampton. How many interactions would it take until everyone was infected?
24 How many infections? Number of interactions (x) Number of infected people (y) Table of results
25 General form: y=b t where b is the base and x is the power (or exponent)
26 The exponential graph ACTIVITY: Use Desmos or Geogebra online to graph
27 How many layers do you produce? Number of folds (x) Number of layers (y) Heig ht (cm) *2 32*2*2 42*2*2* Table of results y=2 x
28 Common features of y=b x all curves pass through (0,1) exponential growth (and decay) takes place very rapidly b > 0 b 0 b 1 b > 1 has a positive gradient (PLOT THIS!) 0 < b < 1 has a negative gradient (PLOT THIS!) fh
29 HANDOUT Exponential growth and decay worksheet Exponential growth and decay practice sheet Exponential growth and decay practice sheet (and answers)answers We are not doing all of these during the session.
30 Logarithms Logarithm is another name for a power So let’s say you know there are 32 layers in the folding task. How many times has someone made a fold? You could say taking ‘logarithms’ is the opposite of exponential growth or decay. Exponential form Logarithmic form
Log examples: positive numbers
32 ACTIVITY In pairs decide whether or not each statement (selection)
33 Further practice There is further practice of conversion between logs and powers on: and the Logarithms practice sheet
34 Why do we even need log paper? The exponential graph ACTIVITY: Now with log paper. Demonstrate with Geogebra
Log paper Log paper and powers of Step by step example Use of calculator Most tools are rather poor at log paper!
TASK LOG PAPER – SARS - PLOT Number of interactions (x) Number of people with the disease (y) , , , ,625
PEDAGOGY How would you teach these topic?
From sample exam question
Discuss exam question
Lunch break
Statistical tests These slides partly rely on the excellent resources from SteveJ64 from the TES website
Slide exam curriculum From (i) A.1 - handling data A.1.9 Select and use a statistical test Candidates may be tested on their ability to select and use: the Chi squared test to test the significance of the difference between observed and expected results the Students t-test the correlation coefficient A.1.11 Identify uncertainties in measurements and use simple techniques to determine uncertainty when data are combined Candidates may be tested on their ability to: calculate percentage error where there are uncertainties in measurement
Normal distribution
45 Suppose we have a crate of apples which are to be sorted by weight into small, medium and large. If we wanted 25% to be in the large category, we would need to know the lowest weight a “large” apple could be. To solve a problem like this we can use a statistical model. A model often used for continuous quantities such as weight, volume, length and time is the Normal Distribution. The Normal Distribution is an example of a probability model.
46 The Normal Distribution curve To fit the curve we use the mean, m, and variance,, of the data. These are the parameters of the model. If X is the random variable “ the weight of apples”, we write Characteristics of the Normal Distribution The Normal distribution model is a symmetric bell-shaped curve. We fit it as closely as possible to the data. Reminder: s is the standard deviation.
47 e.g. The axis of symmetry of the Normal distribution passes through the mean.
48 e.g. A smaller variance “squashes” the distribution closer to the mean.
49 The percentages of the Normal Distribution lying within the given number of standard deviations either side of the mean are approximately: SUMMARY 1 s.d. : 68% 2 s.d. : 95% 3 s.d. : 99 · 8 % 68%95% 99 · 8%
Statistical tests Type of data collected –Measurements –Frequencies What are you looking for? –Associations –Differences
MINI WORKSHOPS Break up in smaller groups to study a particular stats topic Slides and materials are available Sample exam questions can also help guide the work At end we come back together and share knowledge and experiences
Chi-squared (χ 2 ) test
KARL PEARSON ( ) British mathematician, ‘father’ of modern statistics and a pioneer of eugenics! (Pearson’s)
Chi-squared (χ 2 ) test This test compares measurements relating to the frequency of individuals in defined categories e.g. the numbers of white and purple flowers in a population of pea plants. Chi-squared is used to test if the observed frequency fits the frequency you expected or predicted.
How do we calculate the expected frequency? You might expect the observed frequency of your data to match a specific ratio. e.g. a 3:1 ratio of phenotypes in a genetic cross. Or you may predict a homogenous distribution of individuals in an environment. e.g. numbers of daisies counted in quadrats on a field. Note: In some cases you might expect the observed frequencies to match the expected, in others you might hope for a difference between them.
Example 1: GENETICS Comparing the observed frequency of different types of maize grains with the expected ratio calculated using a Punnett square.
The photo shows four different phenotypes for maize grain, as follows: Purple & Smooth (A), Purple & Shrunken (B), Yellow & Smooth (C) and Yellow & Shrunken (D)
GametesPSPspSps PSPPSSPPSsPpSSPpSs PsPPSsPPssPpSsPpss pSPpSSPpSsppSSppSs psPpSsPpssppSsppss The Punnett square below shows the expected ratio of phenotypes from crosses of four genotypes of maize. A : B : C : D = 9 : 3 : 3 : 1
H 0 = there is no statistically significant difference between the observed frequency of maize grains and the expected frequency (the 9:3:3:1 ratio) H A = there is a significant difference between the observed frequency of maize grains and the expected frequency If the value for χ 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
How critical value P-value For the tests do not do all step-by-step recipes Use the datasets from the separate presentations to ask ‘what test would be appropriate here’. Central: chi squared, mini workshops –T-test –Spearman rank –SE and confidence intervals Mini workshop interpretation and reporting.
Calculating χ 2 χ 2 = (O – E) 2 E O = the observed results E = the expected (or predicted) results
PhenotypeO E (9:3:3:1) O-E(O-E) 2 E A B C D 433 χ2=χ2= 7.91
Compare your calculated value of χ 2 with the critical value in your stats table Our value of χ 2 = 7.91 Degrees of freedom = no. of categories - 1 = 3 D.F.Critical Value (P = 0.05) Our value for χ 2 exceeds the critical value, so we can reject the null hypothesis. There is a significant difference between our expected and observed ratios. i.e. they are a poor fit.
Example 2: ECOLOGY One section of a river was trawled and four species of fish counted and frequencies recorded. The expected frequency is equal numbers of the four fish species to be present in the sample.
H 0 = there is no statistically significant difference between the observed frequency of fish species and the expected frequency. H A = there is a significant difference between the observed frequency of fish and the expected frequency If the value for χ 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
Calculating χ 2 χ 2 = (O – E) 2 E O = the observed results E = the expected (or predicted) results
SpeciesOEO-E(O-E) 2 E Rudd Roach Dace Bream 40 χ2=χ2= 10.2
Compare your calculated value of χ 2 with the critical value in your table of critical values. Our value of χ 2 = 10.2 Degrees of freedom = no. of categories - 1 = 3 D.F.Critical Value (P = 0.05) Our value for χ2 exceeds the critical value, so we can reject the null hypothesis. There is a significant difference between our expected and observed frequencies of fish species.
Example 3: CONTINGENCY TABLES You can use contingency tables to calculate expected frequencies when the relationship between two quantities is being investigated. In this example we will look at the incidence of colour blindness in both males and females.
H 0 = there is no statistically significant difference between the observed frequency of colour blindness in males and females. H A = there is a significant difference between the between the observed frequency of colour blindness in males and females If the value for χ 2 exceeds the critical value (P = 0.05), then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
Observed frequenciesMalesFemales Colour blind5614 Not colour blind e.g. The expected frequency for colour blind males = ( ) x ( ) 1360 = 42 Expected Cell Frequency = (Row Total x Column Total) n
Observed: MalesFemales Colour blind 5614 Not colour blind Expected: MalesFemales Colour blind 4228 Not colour blind MalesFemales Colour blind Not colour blind χ 2 = … (O – E) 2 E = = (O – E) 2 / E
Compare your calculated value of χ 2 with the critical value in your table of critical values Our value of χ 2 = Deg of Freedom = (2 rows - 1) x (2 cols – 1) = 1 D.F.Critical Value (P = 0.05) Our value for χ 2 exceeds the critical value, so we can reject the null hypothesis. There is a significant difference between our expected and observed frequencies. The fraction of males with colour blindness is greater than that in females. The difference cannot be attributed to chance alone.
Spearman rank
Charles Spearman ( ) British Psychologist and pioneer of IQ theory
What can this test tell you? Whether there is a statistically significant correlation between two measurements from the same sample when you have 5-30 pairs of data. If the correlation is negative or positive Spearman’s Rank Correlation Coefficient
What is the correlation coefficient r ?
Does being good at maths make you better at biology? StudentMaths exam score Biology exam score Anand5783 Bernard4537 Charlotte7241 Demi7885 Eustace5356 Ferdinand6385 Gemma8677 Hector9887 Ivor5970 Jasmine7159 Is there a statistically significant correlation between these two sets of results?
Spearman’s Rank Correlation Coefficient: r s Where: N = the number of individuals in the sample D = difference in the rank of the two measurements made on an individual r s will be a number between – 1 and +1 This number can be compared with those in a table of critical values r s = 1 – [ 6 x ∑ D 2 N 3 – N) ]
H 0 = there is no statistically significant correlation between Maths scores and Biology scores H A = there is a statistically significant correlation between Maths scores and Biology scores A negative value for r s implies a negative correlation A positive value for r s implies a positive correlation If the value for r s exceeds the critical value, then you can reject the null hypothesis
Step 1: Rank each set of data StudentMaths exam score Maths rank Biology exam score Biology rank Alex5783 Bernard4537 Charlotte7241 Demi7885 Eustace5356 Ferdinand6385 Gemma8677 Hector9887 Ivor5970 Jasmine (lowest to highest) Where two or more scores are tied each is assigned an average rank
StudentMaths exam score Maths rank Biology exam score Biology rankDD2D2 Alex Bernard Charlotte Demi Eustace Ferdinand Gemma Hector Ivor Jasmine ∑D 2 = Step 2: Work out the differences in ranks (maths – biology) Step 3: Work out the square of the differences Step 4: Work out the sum of the square of the differences
Step 5: Work out the correlation coefficient, r s N = 10 ∑D 2 = 68.5 r s = x 68.5) 10 3 – 10) = x 68.5) = = 1 – 0.415= r s = 1 – [ 6 x ∑ D 2 N 3 – N) ] Where:
Step 6: Compare your calculated value of r s with the relevant critical value in your stats table For N = 10 and P = 0.05, the critical value of r s is 0.65 Our value of r s is Because this is below the critical value, we must accept H 0 There is no statistically significant correlation between Maths scores and Biology scores r s critical values (P=0.05) No. of pairsCritical value
Mad Geoff’s Chaotic Firework Factory Do peoples stress levels increase the closer they live to Mad Geoff’s Chaotic Firework Factory? Cortisol is a stress hormone The more stressed an individual is, the higher their blood cortisol levels will be Acacia Ave.
ResidentAddressBlood cortisol level (μg/ml) Karl (Caretaker) 2 (Factory) 13.4 Lillie Melanie Nigel Olga Peter Quentin Rajesh Susan Toni Uri Vanessa H 0 = there is no significantly significant correlation between proximity to the fireworks factory and blood cortisol levels
ResidentAddressAddress rank Blood cortisol level (μg/ml) Cortisol rankDD2D2 Karl213.4 Lillie822.6 Melanie Nigel Olga Peter Quentin Rajesh Susan189.8 Toni Uri188.8 Vanessa207.5 ∑D
r s = ∑D 2 N 3 – N) N = 12 ∑D 2 = 513 r s = 1 - 6(513) = 1 - 6(513) = = 1 – 1.794= Where:
When comparing r s to the critical value ignore the sign on r s For N = 12 and P = 0.05, the critical value of r s is Our value of r s is Because this is above the critical value, we can reject H 0 There is a statistically significant correlation between proximity to the fireworks factory and blood cortisol levels r s is negative so there is a negative correlation... Therefore: the further one lives from the firework factory, the lower one’s blood cortisol levels.
Pearson
KARL PEARSON ( ) British mathematician, ‘father’ of statistics and a pioneer of eugenics!
What can this test tell you? If there is a statistically significant correlation between two measured variables, X and Y, and…. If that correlation is negative or positive Pearson’s Correlation Coefficient Note: The data must show normal distribution
What is the correlation coefficient r ?
Is there a significant correlation between an animal’s nose-to-tail length and its body mass? AnimalMass (arbitrary units) Length (arbitrary units) If yes, then is the correlation positive (does a long tail mean a larger mass)?
Pearson’s Correlation Coefficient Where: n = the number of values of X and Y r will always be a number between –1 and +1 This number can be compared with those in a table of critical values using: n – 2 degrees of freedom. ∑XY – [(∑X)(∑Y)]/n {∑X 2 -[(∑X) 2 /n]} {∑Y 2 -[(∑Y) 2 /n]} r =
H 0 = there is no statistically significant correlation between length and body mass H A = there is a statistically significant correlation between length and body mass A negative value for r implies a negative correlation A positive value for r implies a positive correlation If the value for r exceeds the critical value, then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
Construct the following results table Animal (n = 7) Mass (X) Length (Y) X2X2 Y2Y2 XY Total Mean
Construct the following table: Animal Mass (X) Length (Y) X2X2 Y2Y2 XY Total X = 37 Mean 5.29
Calculate values for X: Animal Mass (X) Length (Y) X2X2 Y2Y2 XY Total X = 37 X 2 = 251 Mean5.29
Calculate values for Y: Animal Mass (X) Length (Y) X2X2 Y2Y2 XY Total X = 37 Y = 82 X 2 = 251 Y 2 = 1278 Mean
Calculate values for XY: Animal Mass (X) Length (Y) X2X2 Y2Y2 XY Total X = 37 Y = 82 X 2 = 251 Y 2 = 1278 XY = 553 Mean
Use values obtained to populate the equation: X = 37 Y = 82 X 2 = 251 Y 2 = 1278 XY = 553 ∑XY – [(∑X)(∑Y)]/n {∑X 2 -[(∑X) 2 /n]} {∑Y 2 -[(∑Y) 2 /n]} r = 553 – (37 x 82)/7 {251-[37 2 /7]} {1278-[82 2 /7]} r = n = 7 = r =
Compare your calculated value of r with the relevant critical value in your stats table Our value of r = Degrees of freedom = n - 2 = 5 D.F.Critical Value (P = 0.05) Our value for r exceeds the critical value, so we can reject the null hypothesis. The + sign shows that any correlation is positive. We can conclude that there is a significant positive correlation between the length of an animal and its body mass i.e. a long tail is associated with a large body mass!
Two sample t-test
William Gosset (aka ‘Student’) ( ) Worked in quality control at the Guinness brewery and could not publish under his own name. Former student of Karl Pearson
What can this test tell you? If there is a statistically significant difference between two means, when: The sample size is less than 25. The data is normally distributed The t-test
t-test x 1 = mean of first sample x 2 = mean of second sample s 1 = standard deviation of first sample s 2 = standard deviation of second sample n 1 = number of measurements in first sample n 2 = number of measurements in second sample x 1 – x 2 (s 1 2 /n 1 ) + (s 2 2 /n 2 ) t = SD = (x – x) 2 n – 1
Worked example Does the pH of soil affects seed germination of a specific plant species? Group 1: eight pots with soil at pH 5.5 Group 2: eight pots with soil at pH seeds planted in each pot and the number that germinated in each pot was recorded.
H 0 = there is no statistically significant difference between the germination success of seeds in two soils of different pH H A = there is a significant difference between the germination of seeds in two soils of different pH If the value for t exceeds the critical value (P = 0.05), then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
PotGroup 1 (pH5.5) (x – x) 2 Group 2 (pH7.0) (x – x) Mean Construct the following table…
Calculate standard deviation for both groups SD = (x – x) 2 n – 1 SD = (x – x) 2 n – 1 Group 1: Group 2: = – 1 = – 1 = 2.36 = 3.42
Using your means and SDs, calculate value for t x 1 – x 2 (s 1 2 /n 1 ) + (s 2 2 /n 2 ) t = 39.1 – 43.5 ( /8) + ( /8) t = t = =
Compare our calculated value of r with the relevant critical value in the stats table of critical values Our value of t = 2.99 Degrees of freedom = n 1 + n 2 – 2 = 14 D.F.Critical Value (P = 0.05) Our value for t exceeds the critical value, so we can reject the null hypothesis. We can conclude that there is a significant difference between the two means, so pH does affect the germination rate for this plant.
Standard error and confidence limits
What can this test tell you? If there is a statistically significant difference between two means, when: The sample size is at least 30. The data are normally distributed. NB: You can use this test to assess up to 5 means on the same graph. Standard Error with 95% Confidence Limits
Worked example A student investigated the variation in the length of mussel shells on two different locations on a rocky shore. The student measured the shell length of 30 mussels at each location.
H 0 = there is no statistically significant difference between the means of the two samples of mussels H A = there is a significant difference between the means of the two samples of mussels If the 95% confidence limit around the means do not overlap, then you can reject the null hypothesis. What is the null hypothesis (H 0 )?
Step 1: Calculate SD for both groups SD = (x – x) 2 n – 1 SD = (x – x) 2 n – 1 Group 1: Group 2: = – 1 = – 1 = 11.2 = 7.0
Step 2: Calculate the SE for both groups SD n SE = SD n SE = Careful with rounding off roundig offs Group 1: Group 2: 7.0 30 = 11.2 30 = = 2.04 = 1.28
Step 3: Calculate the 95% confidence limits Mean ± 2 x SE Group 1: Upper limit = 61 + (2 x 2.04) = Lower limit = 61 – (2 x 2.04) = Group 2: Upper limit = 33 + (2 x 1.28) = Lower limit = 33 – (2 x 1.28) = Note: in recent exam specs maybe 1.96 instead of 2 (more precise)
With no overlap, we can conclude that there is a significant difference between the two means. Shell lengths differ significantly between the two locations. Step 3: Plot means and confidence limits Mean ± 2 x SE
EXAM QUESTIONS Should guide you quite a lot:
Conclusion Hopefully some skills, knowledge and confidence added Please fill in the evaluation form and In a month’s time we would like to send you some follow-up questions. –Impact form –Maths confidence Thank you for your attention.