2D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
3The three averages and range There are three different types of average:MODEmost commonMEANsum of valuesnumber of valuesMEDIANmiddle valueThe range is not an average, but tells you how the data is spread out:RANGElargest value – smallest value
4The mode The most common item is called the mode. The mode is the item that occurs the most often in a data set.In the graph the mode is sprint because it is represented by the highest bar.We could also say “The modal athletic event is sprint.”Is it possible to have more than one modal value?YesThere could be two or more events that are equally the most popular; or all events could be equally popular.Is it possible to have no modal value?Yes
5The modeThese figures show the number of pupils that attended a school athletics club each week.Discuss :Over how many weeks were the results collected?What is the modal number of pupils attending?Are there any unusual results in the data set?Very unusual results are called outliers. Can you think of any possible reasons for the outlier in this data set?These questions could be discussed in pairs.A possible reason for the outlier is that the pupils were on study leave; or that it was half term; or the coach/ teacher was absent.It would be sensible to put the results into a tally chart if the data set were very large.If the data set were very large, what would be the best wayto find the mode?
6How many sports do you play? A group of pupils were asked how many sports they played.This graph shows the results.2468101214135Numbers of sports playedFrequencyThis graph represents numerical data. Some foundation pupils may confuse the frequency with the number of sports, i.e. give 13 as the mode rather than 1. In this situation, you could begin writing out the whole list of results to illustrate the meanings of the number on the two axes.Pupils might want to discuss why so many people play no sports at all and whether this is a good thing.How many pupils play more than two sports?What is the modal number of sports played?How many pupils took part in the survey?
7Grouped dataThis graph represents girls’ times for a 100 m sprint race.246810FrequencyTimes in seconds121314151617181920The modal group is 16 ≤ t < 17 seconds, although you might not want to introduce this notation yet, depending on the level of your pupils. Explain that the numbers at the left-hand side of each bar are included in the bars, but the numbers on the right are not. For example, for the modal group, 16 seconds exactly is included but 17 seconds would be included in the next group. This is an example of continuous data, so the bars are joined together. Discuss how accurately the times might have been measured e.g. to the nearest tenth or hundredth.You might also want to discuss the shape of the graph. Why does it peak in the middle and taper off at each end?What is the modal time interval?How many girls are in this interval?
8When the mode is not appropriate Another survey is carried out among university students.The results are represented in this table:A newspaper reporter writes:“You may be surprised to learn that the average number of sports played by university students is 0.”943510621Numbers of sports played151720FrequencyDo you think this is a fairrepresentation of the data?It is not fair because the mode does not show that most of the students play 1 or more sports. (You could ask them to work out exactly what percentage this is.) Pupils may suggest that the mean or the median is a fairer way of representing the data. This will be covered on a later slide.Why is the mode a misleadingaverage in this example?Should the reporter say whichaverage has been used?
9Skewed dataData that is heavily weighted towards one end of the data set is said to be skewed.When data is skewed, the mode is not an appropriate average.510152025123467Numbers of sports playedFrequencyPositively skewed data24681012141357Numbers of sports playedFrequencyNegatively skewed dataAsk pupils to say what the modal number of sports played is for each graph, and to explain why it does not represent the data very fairly.
10D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
11Comparing dataSt Clement Danes School holds an inter-form athletics competition. Each class must select their five best boys and five best girls for each event.Here are the times in seconds for the 100 m sprint for the two best classes.13.116.514.315.416.413.812.815.913.415.322.214.171.1244.712.014.911.510C boys10C girls10B boys10B girlsThe data provided is taken from Year 10 in 2004 at St Clement Danes School, Hertfordshire. Times have been rounded to the nearest tenth.The discussion should involve looking at boys’ and girls’ scores separately and together. The next three slides show how to calculate the means.Which class should win and why?
12The mean The mean is the most commonly used average. To calculate the mean of a set of values we add together the values and divide by the total number of values.Mean =Sum of valuesNumber of valuesFor example, the mean time for Class 10B girls is:5=73.65=14.72
13Based on these results, who should win? The meanCalculate the mean times for the other three groups.mean time10C boys10C girls10B boys10B girls14.7213.1815.7812.64Now calculate means for Class 10B and Class 10C (with girls and boys combined).mean timeClass 10CClass 10BDiscuss an appropriate level of accuracy. Pupils would benefit from a print out of the data.13.9514.21Based on these results, who should win?
14Calculating the meanPupils could make their own clouds too. Working from the mean and the number of items, they should generate their own list of numbers to fit.
15Calculating a missing data item Pupils could hide one of the numbers in their own cloud, and have a partner work it out.
16Outliers and their effect on the mean The school athletics team takes part in an inter-school competition. James’s shot results (in metres) are below.Discuss:What is the mean throw?Is this a fair representation of James’s ability? Explain.What would be a fair way for the competition to operate?The mean is ÷ 7 = 8.81 mThe best throw could be used, or the worst score could be classes as an outlier and removed.A data item that is significantly higher or lower than the other items is called an outlier. Outliers can increase or reduce the mean dramatically, making it a less accurate measure of the data.
17Outliers and their effect on the mean Here are some 1500 metre race results in minutes.Discuss:Are there any outliers?Will the mean be increased or reduced by the outlier?Calculate the mean with the outlier.Now calculate the mean without the outlier. How muchdoes it change?The mean with the outlier is 59.1 ÷ 9 = 6.57 minutesThe mean without the outlier is ÷ 8 =6.06 minutesAnother example of when an outlier would occur is an experiment on reaction time, where an anomalous result (e.g. if the participant’s hand slips) will be very large compared with the rest of the results.It may be appropriate in research or experiments to remove an outlier before carrying out analysis of results.
18D2.3 Calculating the mean from frequency tables ContentsD2 Averages and rangeAD2.1 The modeAD2.2 The meanAD2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
19Calculating the mean from a frequency table Here are the results of a survey carried out among university students.If you were to write out the whole list of results, what would it look like?Numbers of sports playedFrequency201172153104956Pupils may benefit from calculating the mean from the list before they are ready to appreciate the multiplication method.What do you think the mean will be?
20Calculating the mean from a frequency table 263910151720FrequencyNumber of sports× frequency451Numbers of sports played0 × 20= 01 × 17= 172 × 15= 303 × 10= 304 × 9= 365 × 3= 15Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of discrete data.6 × 2= 12TOTAL76140Mean = 140 ÷ 76 =2 sports (to the nearest whole number)
21Grouped data Here are the boys’ javelin scores. Javelin distances in metresFrequency5 ≤ d < 10110 ≤ d < 15815 ≤ d < 201220 ≤ d < 251025 ≤ d < 30330 ≤ d < 3535 ≤ d < 4036How is the data different from the previous examples?Because the data is grouped, we do not know individual scores. It is not possible to add up the scores.The data has been grouped.How could you calculate the mean from this data?
22Midpoints Javelin distances in metres Frequency 5 ≤ d < 10 1 815 ≤ d < 201220 ≤ d < 251025 ≤ d < 30330 ≤ d < 3535 ≤ d < 40It is possible to find an estimate for the mean.This is done by finding the midpoint of each group.To find the midpoint of the group ≤ d < 15:= 2525 ÷ 2 =The other midpoints are displayed on the next page. Point out the link between the midpoint and the median/ mean. Discuss the fact that it is likely that the scores within a group are evenly distributed i.e. half above and half below the midpoint. This is the best assumption to make, although it is obviously not always true. (The greater the data set, the more likely this is to be the case.) Some pupils may point out that 15 is not included in the group 10 ≤ d < 15; however, since the data is continuous results can get very close to 15 (e.g ) - this will make no difference to an estimated mean.12.5 mFind the midpoints of the other groups.
23Estimating the mean from grouped data 135 ≤ d < 40310128FrequencyMidpoint30 ≤ d < 35Frequency × midpoint25 ≤ d < 3020 ≤ d < 2515 ≤ d < 2010 ≤ d < 155 ≤ d < 10Javelin distances in metres7.51 × 7.5= 7.512.58 × 12.5= 10017.512 × 17.5= 21022.510 × 22.5= 22527.53 × 27.5= 82.532.51 × 32.5= 32.5Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of continuous data.37.51 × 37.5= 37.5TOTAL36695Estimated mean = 695 ÷ 36= 19.3 m (to 1 d.p.)
24How accurate is the estimated mean? Here are the javelin distances thrown before the data was grouped.35.0031.0528.8925.6025.3324.1123.5021.8221.7821.7721.6021.0020.7020.2020.0019.5018.8217.3517.3116.6415.7915.7515.6915.5215.2515.0014.5012.8012.5012.0011.8510.009.50Work out the mean from the original data above and compare it with the estimated mean found from the grouped data.Emphasise that although the estimated mean can be quite accurate it is preferable to use the original data if this is available.The estimated mean is 19.3 metres (to 1 d.p.).The actual mean is18.7 metres (to 1 d.p.).How accurate was the estimated mean?
25D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
26Calculate the median of the 1500 m results. The median is the middle number when all numbers are in order.Calculate the median of the 1500 m results.Write the results in order and find the middle value:The median is not affected by the value of the outlier.Why is this a more appropriate average than the mean for these results?
27Choosing the most appropriate average What are the mean and median for these sets of attendance figures for three lunchtime activities?23222120191817ChoirDrama club292825OrchestraExplain your answers.For the drama club and choir, the means and medians are all 20. In the choir, the numbers are evenly distributed around the middle.For the orchestra, the mean is 157 ÷ 7 = 22.4 and the median is 20. The mean is pushed up by the high figures at the end. The mean is better because it takes account of all the data; there is not just one outlier so it is appropriate to include all data. The orchestra is clearly the most popular on average, but the median does not show this.Which average is the best one to use when deciding which of the three activities is the most popular? Why?
28Outliers and the median and mean This activity illustrates the way that outliers affect the mean but do not affect the median. For each set of data, say which would be the most appropriate average and why.
29When there are two middle numbers Here are 10B girls’ long jump results in metres.How could you work out the median jump?If there are two middle numbers, you need to find what ishalfway between them.Add the two numbers together and then divide by two.The median is 3.06 m.2.80 m m = 6.12 m6.12 m ÷ 2 = 3.06 m
30Finding halfway between two numbers Reset to generate different examples. Sometimes it will be the midway number that needs to be found; but on other occasions, this will be given and one of the two endpoints will be hidden.
31One or two middle numbers? If there are 9 numbers in a list, will there be 1 or 2 middle numbers?If there are 10 numbers in a list, will there be 1 or 2 middle numbers?If there is an even number of numbers in a list, therewill be two middle numbers.Discuss what the median is in each case.If there is an odd number of numbers in a list, therewill be one middle number.
32When there are two middle numbers To find out where the middle number is in a very long list, call the number of numbers n. The middle number is then:(n + 1) ÷ 2For example,There are 100 numbers in a list. Where is the median?101 ÷ 2 = 50.5th number in the list (halfway between the 50th and the 51st).Pupils should first predict whether there will be one or two middle numbers. They should see a connection between the fact that an odd number divided by 2 will always give an answer ending in .5 so that there will be two medians. Remind pupils that these numbers are not the medians - they are just the positions of the medians in the list.There are 37 numbers in a list. Where is the median?38 ÷ 2 = 19th number in the list.
33Where is the median?Discuss how to find the median; press reset to use a new data set. Sometimes there will be an odd number and sometimes an even number of items.
34D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
35The rangeThe highest and lowest scores can be useful in deciding who is more consistent.The lowest score subtracted from the highest score is called the range.Remember that the range is not an average, but a measure of spread.If the scores are spread out then the range will be higher and the scores less consistent.If the scores are close together then the range will be lower and the scores more consistent.
36The rangeHere are the high jump scores in metres for two girls in five different competitions:Joanna1.621.411.351.201.15Kirsty1.591.451.30Find the range for each girl’s results and use this to find out who is the most consistent.Joanna’s range = 1.62 – 1.15 = 0.47Discuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent.Kirsty’s range = 1.59 – 1.30 = 0.29
37The range Joanna 1.62 1.41 1.35 1.20 1.15 Kirsty 1.59 1.45 1.30 Now calculate the mean for each girl.JoannaKirstyRangeMean0.47 m0.29 m1.35 m1.41 mDiscuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent. Kirsty also has a higher mean. Performing well under pressure is a very important skill for athletes, and so Kirsty may be a better choice. You might also want to use words like “reliable” in this context.Use these results to decide which one you would enter into the athletics competition and why.
38Calculating the mean, median and range Each time the activity is reset a new set of data is generated. Calculate the mean, median and range. This could be done as a competition in teams or with mini whiteboards.If required, ask a volunteer to come to the board and use the pen tool to write the given data set in order first.
39Comparing sets of dataHere is a summary of Chris and Rob’s performance in the 200 metres over a season. They each ran 10 races.ChrisRobMean24.8 seconds25.0 secondsRange1.4 seconds0.9 secondsWhich of these conclusions are correct?Rob is more reliable.Rob is better because his mean is higher.The first and the last statements are correct. The data on the next page will illustrate why the fourth statement is not always correct. The second statement is not correct because a higher mean means he is slower. The third statement is incorrect because a high range means he is inconsistent.Chris is better because his range is higher.Chris must have run a better time for his quickest race.On average, Chris is faster but he is less consistent.
40Comparing sets of data Chris Rob Mean 24.8 seconds 25.0 seconds Range 24.424.524.624.925.025.125.824.325.2Here is the original data for Chris and Rob.Use the summary table above to decide which data set is Chris’s and which is Rob’s?The first set of data is Chris’s and the second Rob’s. The data illustrates why the fourth statement is not always correct: Chris has a higher mean, but Rob has the best time of 24.3 seconds.Who has the best time?Who has the worst time?
41Comparing hurdles scores Here are the top eleven hurdles scores in seconds for pupils aged 14 and 15.Age 1412.114.015.315.415.615.716.116.717.0Age 1512.313.715.515.615.916.016.117.122.9Work out the mean and range.Age 14Age 15MeanRange15.416.14.910.6Which age group do you think is better and why?Discuss the fact that the extreme value 22.9 seconds significantly affects the mean and range for pupils aged 15.Why might the 15 year olds feel thecomparison is unfair?
42Finding the interquartile range The time of 22.9 seconds is an outlier.When there are outliers in the data, it is more appropriate to calculate the interquartile range.The interquartile range is the range of the middle half of the data.The lower quartile is the data value that is quarter of the way along the list.The upper quartile is the data value that is three quarters of the way along the list.interquartile range = upper quartile – lower quartile
43Locating the upper and lower quartiles There are 11 times in each list.Age 1412.114.015.315.415.615.716.116.717.0Age 1512.313.715.515.615.916.016.117.122.9Where is the median in each list?Where is the lower quartile in each list?Where is the upper quartile in each list?Interquartile range for 14 year olds:Establish that is there are 11 data values the median will be the 6th value in the list leaving 5 values on either side. The lower and upper quartiles are in the middle of these 5 remaining values on either side.Note that the median, lower quartile and upper quartile will only be actual values in the data set when the number of values in the data set is (4n – 1), where n is a whole positive number.When the number of values in the data set is not (4n – 1), where n is a whole positive number, it is acceptable to use an approximation of the upper and lower quartiles using the closest value.The interquartile range for age 15s is smaller than the interquartile range for age 14s.Pupils could be asked to construct a box-and-whisker diagram to compare this data.Link:D4.5 Box-and-whisker diagrams.16.1 – 15.3 = 0.8Interquartile range for 15 year olds:16.1 – 15.5 = 0.6
44The location of quartiles in an ordered data set When there are n values in an ordered data set:The lower quartile =n + 14th valueThe median =n + 12th valueThe upper quartile =3(n + 1)4th valueThese median, lower quartile and upper quartile can also be estimated from a cumulative frequency graph.Links:D4.4 Using cumulative frequency graphs,D4.5 Box-and-whisker diagrams.The interquartile range = the upper quartile – the lower quartile
45Finding the interquartile range Use the activity to discuss how to find the interquartile range. Pressing the play button reveals how this is found. Resetting will produce a new data set.Discuss rounding of the position number – round up for the lower quartile and down for the upper quartileLink this activity to box-and-whisker diagrams by also finding the minimum value, the maximum value and the median and asking pupils to construct the corresponding box-and-whisker diagram.Links:D4.4 Using cumulative frequency graphs,D4.5 Box-and-whisker diagrams.
46Review To review the work you have covered in this topic: Write out the key words on cards.Shuffle the cards.Describe the word on each card to your partner.Your partner must guess the word.Do as many as you can in one minute, then swap over.1) Play “Guess the word”.2) Make up challenges involving sets of data for your partner,such as working out the mean.3) Make a list of possible mistakes to avoid in this topic.