2D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
3The three averages and range There are three different types of average:MODEmost commonMEANsum of valuesnumber of valuesMEDIANmiddle valueThe range is not an average, but tells you how the data is spread out:RANGElargest value – smallest value
4Favourite athletics event This graph shows pupils’ favourite athletics events.5101520SprintLong distance runningHurdlesHigh jumpLong jumpTriple jumpShotDiscusJavelinFrequencyAsk other questions such as “Which is the least favourite event? Why do you think this is? Will a survey like this vary from school to school?”Ensure pupils understand the meaning of the word “frequency”.You may want to draw attention to the fact that the bars are separated because the data is discrete.Which is the most popular event? How do you know?
5The mode The most common item is called the mode. The mode is the item that occurs the most often in a data set.In the graph the mode is sprint because it is represented by the highest bar.We could also say “The modal athletic event is sprint.”Is it possible to have more than one modal value?YesThere could be more than one event that are equally the most popular; or all events could be equally popular.Is it possible to have no modal value?Yes
6The modeWe could write out all the results in a list. The list would begin:How many words (items) would there be in the list altogether?How could we work out the mode from the list if we didn’t have the graph?There are 87 items in the list.Pupils would need to count which word occurred the most often.We can’t work out how many pupils took part unless we know they were only allowed to choose one event.Can we tell how many pupils took part in the survey?
7The modeHere are the attendance figures at a weekly school athletics club for Year 11.Discuss :Over how many weeks were the results collected?What is the modal number of pupils attending?Are there any unusual results in the data set?This result is called an outlier. Can you think of any possiblereasons for the outlier?These questions could be discussed in pairs.A possible reason for the outlier is that the Year 11s were on study leave; or that it was half term; or the coach/ teacher was absent.It would be sensible to put the results into a tally chart if the data set were very long.If the data set were very long, what would be the best wayto find the mode?
8Favourite athletics event Compare this graph to the previous one.2468101214161820SprintLong distance runningHurdlesHigh jumpLong jumpTriple jumpShotDiscusJavelinFrequencyThis graph should stimulate discussion about the possibility of two modes. In the second graph, the bars are generally higher (122 votes). The two groups of pupils could be male and female, or perhaps two age groups or two schools where sport has a higher profile in one of the schools. This could stimulate discussion about why some people like sport better than others and what can be done to encourage more people to take part in sport, as well as whether people should be influenced about their lifestyles by schools or the government.What conclusions can you draw? Which two groups of pupils could be represented by the two graphs?
9How many sports do you play? A group of pupils were asked how many sports they played.This graph shows the results.2468101214135Numbers of sports playedFrequencyThis graph represents numerical data. Some foundation pupils may confuse the frequency with the number of sports, i.e. give 13 as the mode rather than 1. In this situation, you could begin writing out the whole list of results to illustrate the meanings of the number on the two axes.Pupils might want to discuss why so many people play no sports at all and whether this is a good thing.How many pupils play more than two sports?What is the modal number of sports played?How many pupils took part in the survey?
10Grouped dataThis graph represents Year Ten girls’ times for a 100m sprint race.246810FrequencyTimes in seconds121314151617181920The modal group is 16 ≤ t < 17 seconds, although you might not want to introduce this notation yet, depending on the level of your pupils. Explain that the numbers at left hand end of the bar is included in the bar, but the number on the right is not. For example, for the modal group, 16 seconds exactly is included but 17 seconds would be included in the next group. This is an example of continuous data, so the bars are joined together. Discuss how accurately the times might have been measured e.g. to the nearest tenth or hundredth.You might also want to discuss the shape of the graph. Why does it peak in the middle and taper off at each end?What is the modal time interval?How many girls are in this interval?
11When the mode is not appropriate Another survey is carried out among university students.The results are represented in this table:A newspaper reporter writes:“You may be surprised to learn that the average number of sports played by university students is 0.”943510621Numbers of sports played151720FrequencyDo you think this is a fairrepresentation of the data?It is not fair because the mode does not show that most of the students play 1 or more sports. (You could ask them to work out exactly what percentage this is.) Pupils may suggest that the mean or the median is a fairer way of representing the data. This will be covered on a later slide.Why is the mode a misleadingaverage in this example?Should the reporter say whichaverage has been used?
12Skewed dataData that is heavily weighted towards one end of the data set is said to be skewed.When data is skewed, the mode is not an appropriate average.510152025123467Numbers of sports playedFrequencyNegatively skewed data24681012141357Numbers of sports playedFrequencyPositively skewed dataAsk pupils to say what the modal number of sports played is in each graph, and to explain why it does not represent the data very fairly.
13D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
14Comparing dataSt Clement Danes School holds an inter-form athletics competition for Year 10. Each class must select their five best boys and five best girls for each event.Here are the times in seconds for 100 metres sprint for the two best classes.13.116.514.315.416.413.812.815.913.415.318.104.22.1684.712.014.911.510C boys10C girls10B boys10B girlsThe data provided is taken from Year 10 in 2004 at St Clement Danes School, Hertfordshire. Times have been rounded to the nearest tenth.The discussion should involve looking at boys’ and girls’ scores separately and together. The next three slides show how to calculate the means.Which class should win and why?
15The mean The mean is the most commonly used average. To calculate the mean of a set of values we add together the values and divide by the total number of values.Mean =Sum of valuesNumber of valuesFor example, the mean time for Class 10B girls is:5=73.65=14.72
16Based on these results, who should win? The meanCalculate the mean times for the other three groups.mean time10C boys10C girls10B boys10B girls14.7213.1815.7812.64Now calculate means for Class 10B and Class 10C (with girls and boys combined).mean timeClass 10CClass 10BDiscuss an appropriate level of accuracy. Pupils would benefit from a print out of the data.13.9514.21Based on these results, who should win?
17Calculating the meanPupils could make their own clouds too. Working from the mean and the number of items, they should generate their own list of numbers to fit.
18Calculating a missing data item Pupils could hide one of the numbers in their own cloud, and have a partner work it out.
19Outliers and their effect on the mean The school athletics team take part in an inter-schools competition. James’s shot results (in metres) are below.Discuss:What is the mean throw?Is this a fair representation of James’s ability? Explain.What would be a fair way for the competition to operate?The mean is ÷ 7 = 8.81 mThe best throw could be used, or the worst score removed.A data item that is significantly higher or lower than the other items is called an outlier. Outliers affect the mean, by reducing or increasing it.
20Outliers and their effect on the mean Here are some 1500 metre race results in minutes.Discuss:Are there any outliers?Will the mean be increased or reduced by the outlier?Calculate the mean with the outlier.Now calculate the mean without the outlier. How muchdoes it change?The mean with the outlier is 59.1 ÷ 9 = 6.57 minutesThe mean without the outlier is ÷ 8 =6.06 minutesAnother example of when an outlier would occur is an experiment on reaction time, where an anomalous result (e.g. if the participant’s hand slips) will be very large compared with the rest of the results.It may be appropriate in research or experiments to remove an outlier before carrying out analysis of results.
21D2.3 Calculating the mean from frequency tables ContentsD2 Averages and rangeAD2.1 The modeAD2.2 The meanAD2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
22Calculating the mean from a frequency table Here are the results of a survey carried out among university students.If you were to write out the whole list of results, what would it look like?Numbers of sports playedFrequency201172153104956Pupils may benefit from calculating the mean from the list before they are ready to appreciate the multiplication method.What do you think the mean will be?
23Calculating the mean from a frequency table 263910151720FrequencyNumber of sports× frequency451Numbers of sports played0 × 20= 01 × 17= 172 × 15= 303 × 10= 304 × 9= 365 × 3= 15Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of discrete data.6 × 2= 12TOTAL76140Mean = 140 ÷ 76 =2 sports (to the nearest whole)
24Grouped data Here are the Year Ten boys’ javelin scores. Javelin distances in metresFrequency5 ≤ d < 10110 ≤ d < 15815 ≤ d < 201220 ≤ d < 251025 ≤ d < 30330 ≤ d < 3535 ≤ d < 4036How could you calculate the mean from this data?How is the data different from the previous examples you have calculated with?The data has been grouped.Because the data is grouped, we do not know individual scores. It is not possible to add up the scores.
25Midpoints Javelin distances in metres Frequency 5 ≤ d < 10 1 815 ≤ d < 201220 ≤ d < 251025 ≤ d < 30330 ≤ d < 3535 ≤ d < 40It is possible to find an estimate for the mean.This is done by finding the midpoint of each group.To find the midpoint of the group ≤ d < 15:= 2525 ÷ 2 =The other midpoints are displayed on the next page. Point out the link between the midpoint and the median/ mean. Discuss the fact that it is likely that the scores within a group are evenly distributed i.e. half above and half below the midpoint. This is the best assumption to make, although it is obviously not always true. (The greater the data set, the more likely this is to be the case.) Some pupils may point out that 15 is not included in the group 10 ≤ d < 15; however, since it can get very close to 15 (e.g ) this will make no difference to an estimated mean.12.5 mFind the midpoints of the other groups.
26Estimating the mean from grouped data 135 ≤ d < 40310128FrequencyMidpoint30 ≤ d < 35Frequency × midpoint25 ≤ d < 3020 ≤ d < 2515 ≤ d < 2010 ≤ d < 155 ≤ d < 10Javelin distances in metres7.51 × 7.5= 7.512.58 × 12.5= 10017.512 × 17.5= 21022.510 × 22.5= 22527.53 × 27.5= 82.532.51 × 32.5= 32.5Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of continuous data.37.51 × 37.5= 37.5TOTAL36695Estimated mean = 695 ÷ 36= 19.3 m (to 1 d.p.)
27How accurate is the estimated mean? Here are the javelin distances thrown by Year 10 before the data was grouped.35.0031.0528.8925.6025.3324.1123.5021.8221.7821.7721.6021.0020.7020.2020.0019.5018.8217.3517.3116.6415.7915.7515.6915.5215.2515.0014.5012.8012.5012.0011.8510.009.50Work out the mean from the original data above and compare it with the estimated mean found from the grouped data.Emphasise that although the estimated mean can be quite accurate it is preferable to use the original data if this is available. This can be particularly relevant in GCSE coursework.The estimated mean is 19.3 metres (to 1 d.p.).The actual mean is18.7 metres (to 1 d.p.).How accurate was the estimated mean?
28D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
29Calculate the median of the 1500 m results. The median is the middle number when all numbers are in order.Calculate the median of the 1500 m results.Write the results in order and find the middle value:The median is not affected by the value of the outlier.Why is this a more appropriate average than the mean for these results?
30Choosing the most appropriate average What are the mean and median for these sets of attendance figures for three lunchtime activities?23222120191817ChoirDrama club292825OrchestraExplain your answers.For the drama club and choir, the means and medians are all 20. In the choir, the numbers are evenly distributed around the middle.For the orchestra, the mean is 157 ÷ 7 = 22.4 and the median is 20. The mean is pushed up by the high figures at the end. The mean is better because it takes account of all the data; there is not just one outlier so it is appropriate to include all data. The orchestra is clearly the most popular on average, but the median does not show this.To decide which of the three activities is the most popular, which average is a better one to use? Why?
31Outliers and the median and mean This activity illustrates the way that outliers affect the mean but do not affect the median. For each set of data, say which would be the most appropriate average and why.
32When there are two middle numbers Here are 10B girls’ long jump results in metres.How could you work out the median jump?If there are two middle numbers, you need to find what ishalfway between them.If the numbers are far apart, a quick way to find themiddle of those two numbers is to add them up and divideby two.The median is 3.06 m.2.80 m m = 6.12 m6.12 m ÷ 2 = 3.06 m
33Finding halfway between two numbers Reset to generate different examples. Sometimes it will be the midway number that needs to be found; but on other occasions, this will be given and one of the two endpoints will be hidden.
34One or two middle numbers? If there are 9 numbers in a list, will there be 1 or 2 middle numbers?If there are 10 numbers in a list, will there be 1 or 2 middle numbers?If there is an even number of numbers in a list, therewill be two middle numbers.Discuss what the median is in each case.If there is an odd number of numbers in a list, therewill be one middle number.
35When there are two middle numbers To find out where a middle number in a very long list, call the number of numbers n. Then the middle number is then(n + 1) ÷ 2For example,There are 100 numbers in a list. Where is the median?101 ÷ 2 = 50.5th number in the list (halfway between the 50th and the 51st).Pupils should first predict whether there will be one or two middle numbers. They should see a connection between the fact that an odd number divided by 2 will always give an answer ending in .5 so that there will be two medians. Remind pupils that these numbers are not the medians - they are just the positions of the medians in the list.There are 37 numbers in a list. Where is the median?38 ÷ 2 = 19th number in the list.
36Where is the median?Discuss how to find the median; press reset to use a new data set. Sometimes there will be an odd number and sometimes an even number of items.
37D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tablesAD2.4 The medianAD2.5 Comparing data
38The range Here are the high jump scores for two girls in metres. Joanna1.621.411.351.201.15Kirsty1.591.451.30Find the range for each girl’s results and use this to find out who is consistently better.Joanna’s range = 1.62 – 1.15 = 0.47Discuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent.Kirsty’s range = 1.59 – 1.30 = 0.29
39The rangeThe highest and lowest scores can be useful in deciding who is more consistent.The lowest score subtracted from the highest score is called the range.Remember that the range is not an average, but a measure of spread.If the scores are spread out then the range will be higher and the scores less consistent.If the scores are close together then the range will be lower and the scores more consistent.
40The range Joanna 1.62 1.41 1.35 1.20 1.15 Kirsty 1.59 1.45 1.30 Calculate the mean and the range for each girl.JoannaKirstyMeanRange1.35 m1.41 m0.47 m0.29 mDiscuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent. Kirsty also has a higher mean. Performing well under pressure is a very important skill for athletes, and so Kirsty may be a better choice. You might also want to use words like “reliable” in this context.Use these results to decide which one you would enter into the athletics competition and why.
41Calculating the mean, median and range Each time the activity is reset a new set of data is generated. Calculate the mean, median and range. This could be done as a competition in teams or with mini whiteboards.If required, ask a volunteer to come to the board and use the pen tool to write the given data set in order first.
42Comparing sets of dataHere is a summary of Chris and Rob’s performance in the 200 metres over a season. They each ran 10 races.ChrisRobMean24.8 seconds25.0 secondsRange1.4 seconds0.9 secondsWhich of these conclusions are correct?Robert is more reliable.Robert is better because his mean is higher.The first and the last statements are correct. The data on the next page will illustrate why the fourth statement is not always correct. The second statement is not correct because a higher mean means he is slower. The third statement is incorrect because a high range means he is inconsistent.Chris is better because his range is higher.Chris must have run a better time for his quickest race.On average, Chris is faster but he is less consistent.
43Comparing sets of data Chris Rob Mean 24.8 seconds 25.0 seconds Range 24.424.524.624.925.025.125.824.325.2Here is the original data for Chris and Rob.Use the summary table above to decide which data set is Chris’s and which is Rob’s?The first set of data is Chris’s and the second Rob’s. The data illustrates why the fourth statement is not always correct: Chris has a higher mean, but Rob has the best time of 24.3 seconds.Who has the best time?Who has the worst time?
44Comparing hurdles scores Here are the top eleven hurdles scores in seconds for Year 9 and Year 10.Year 912.114.015.315.415.615.716.116.717.0Year 1012.313.715.515.615.916.016.117.122.9Work out the mean and range.Year 9Year 10MeanRange15.416.14.910.6Which year group do you think is better and why?Discuss the fact that the extreme value 22.9 seconds significantly affects the mean and range for Year 10.Why might Year 10 feel the comparison is unfair?
45Finding the interquartile range The time of 22.9 seconds is an outlier.When there are outliers in the data, it is more appropriate to calculate the interquartile range.The interquartile range is the range of the middle half of the data.The lower quartile is the data value that is quarter of the way along the list.The upper quartile is the data value that is three quarters of the way along the list.interquartile range = upper quartile – lower quartile
46Locating the upper and lower quartiles There are 11 times in each list.Year 912.114.015.315.415.615.716.116.717.0Year 1012.313.715.515.615.916.016.117.122.9Where is the median in each list?Where is the lower quartile in each list?Where is the upper quartile in each list?Interquartile range for Year 9:Establish that is there are 11 data values the median will be the 6th value in the list leaving 5 values on either side. The lower and upper quartiles are in the middle of these 5 remaining values on either side.Note that the median, lower quartile and upper quartile will only be actual values in the data set when the number of values in the data set is (4n – 1), where n is a whole positive number.When the number of values in the data set is not (4n – 1), where n is a whole positive number, it is acceptable to use an approximation of the upper and lower quartiles using the closest value.The interquartile range for Year 10 is smaller than the interquartile range for Year 9.Pupils could be asked to construct a box-and-whisker diagram to compare this data.Link:D4.5 Box-and-whisker diagrams.16.1 – 15.3 = 0.8Interquartile range for Year 10:16.1 – 15.5 = 0.6
47The location of quartiles in an ordered data set When there are n values in an ordered data set:The lower quartile =n + 14th valueThe median =n + 12th valueThe upper quartile =3(n + 1)4th valueThese median, lower quartile and upper quartile can also be estimated from a cumulative frequency graph.Links:D4.4 Using cumulative frequency graphs,D4.5 Box-and-whisker diagrams.The interquartile range = the upper quartile – the lower quartile
48Finding the interquartile range Use the activity to discuss how to find the interquartile range. Pressing the play button reveals how this is found. Resetting will produce a new data set.Link this activity to box-and-whisker diagrams by also finding the minimum value, the maximum value and the median and asking pupils to construct the corresponding box-and-whisker diagram.Links:D4.4 Using cumulative frequency graphs,D4.5 Box-and-whisker diagrams.
49Review To review the work you have covered in this topic: Write out the key words on cards.Shuffle the cards.Describe the word on each card to your partner.Your partner must guess the word.Do as many as you can in one minute, then swap over.1) Play “Guess the word”.2) Make up challenges involving sets of data for your partner,such as working out the mean.3) Make a list of possible mistakes to avoid in this topic.