# KS4 Mathematics D2 Averages and range.

## Presentation on theme: "KS4 Mathematics D2 Averages and range."— Presentation transcript:

KS4 Mathematics D2 Averages and range

D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A
D2.3 Calculating the mean from frequency tables A D2.4 The median A D2.5 Comparing data

The three averages and range
There are three different types of average: MODE most common MEAN sum of values number of values MEDIAN middle value The range is not an average, but tells you how the data is spread out: RANGE largest value – smallest value

Favourite athletics event
This graph shows pupils’ favourite athletics events. 5 10 15 20 Sprint Long distance running Hurdles High jump Long jump Triple jump Shot Discus Javelin Frequency Ask other questions such as “Which is the least favourite event? Why do you think this is? Will a survey like this vary from school to school?” Ensure pupils understand the meaning of the word “frequency”. You may want to draw attention to the fact that the bars are separated because the data is discrete. Which is the most popular event? How do you know?

The mode The most common item is called the mode.
The mode is the item that occurs the most often in a data set. In the graph the mode is sprint because it is represented by the highest bar. We could also say “The modal athletic event is sprint.” Is it possible to have more than one modal value? Yes There could be more than one event that are equally the most popular; or all events could be equally popular. Is it possible to have no modal value? Yes

The mode We could write out all the results in a list. The list would begin: How many words (items) would there be in the list altogether? How could we work out the mode from the list if we didn’t have the graph? There are 87 items in the list. Pupils would need to count which word occurred the most often. We can’t work out how many pupils took part unless we know they were only allowed to choose one event. Can we tell how many pupils took part in the survey?

The mode Here are the attendance figures at a weekly school athletics club for Year 11. Discuss : Over how many weeks were the results collected? What is the modal number of pupils attending? Are there any unusual results in the data set? This result is called an outlier. Can you think of any possible reasons for the outlier? These questions could be discussed in pairs. A possible reason for the outlier is that the Year 11s were on study leave; or that it was half term; or the coach/ teacher was absent. It would be sensible to put the results into a tally chart if the data set were very long. If the data set were very long, what would be the best way to find the mode?

Favourite athletics event
Compare this graph to the previous one. 2 4 6 8 10 12 14 16 18 20 Sprint Long distance running Hurdles High jump Long jump Triple jump Shot Discus Javelin Frequency This graph should stimulate discussion about the possibility of two modes. In the second graph, the bars are generally higher (122 votes). The two groups of pupils could be male and female, or perhaps two age groups or two schools where sport has a higher profile in one of the schools. This could stimulate discussion about why some people like sport better than others and what can be done to encourage more people to take part in sport, as well as whether people should be influenced about their lifestyles by schools or the government. What conclusions can you draw? Which two groups of pupils could be represented by the two graphs?

How many sports do you play?
A group of pupils were asked how many sports they played. This graph shows the results. 2 4 6 8 10 12 14 1 3 5 Numbers of sports played Frequency This graph represents numerical data. Some foundation pupils may confuse the frequency with the number of sports, i.e. give 13 as the mode rather than 1. In this situation, you could begin writing out the whole list of results to illustrate the meanings of the number on the two axes. Pupils might want to discuss why so many people play no sports at all and whether this is a good thing. How many pupils play more than two sports? What is the modal number of sports played? How many pupils took part in the survey?

Grouped data This graph represents Year Ten girls’ times for a 100m sprint race. 2 4 6 8 10 Frequency Times in seconds 12 13 14 15 16 17 18 19 20 The modal group is 16 ≤ t < 17 seconds, although you might not want to introduce this notation yet, depending on the level of your pupils. Explain that the numbers at left hand end of the bar is included in the bar, but the number on the right is not. For example, for the modal group, 16 seconds exactly is included but 17 seconds would be included in the next group. This is an example of continuous data, so the bars are joined together. Discuss how accurately the times might have been measured e.g. to the nearest tenth or hundredth. You might also want to discuss the shape of the graph. Why does it peak in the middle and taper off at each end? What is the modal time interval? How many girls are in this interval?

When the mode is not appropriate
Another survey is carried out among university students. The results are represented in this table: A newspaper reporter writes: “You may be surprised to learn that the average number of sports played by university students is 0.” 9 4 3 5 10 6 2 1 Numbers of sports played 15 17 20 Frequency Do you think this is a fair representation of the data? It is not fair because the mode does not show that most of the students play 1 or more sports. (You could ask them to work out exactly what percentage this is.) Pupils may suggest that the mean or the median is a fairer way of representing the data. This will be covered on a later slide. Why is the mode a misleading average in this example? Should the reporter say which average has been used?

Skewed data Data that is heavily weighted towards one end of the data set is said to be skewed. When data is skewed, the mode is not an appropriate average. 5 10 15 20 25 1 2 3 4 6 7 Numbers of sports played Frequency Negatively skewed data 2 4 6 8 10 12 14 1 3 5 7 Numbers of sports played Frequency Positively skewed data Ask pupils to say what the modal number of sports played is in each graph, and to explain why it does not represent the data very fairly.

D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A
D2.3 Calculating the mean from frequency tables A D2.4 The median A D2.5 Comparing data

Comparing data St Clement Danes School holds an inter-form athletics competition for Year 10. Each class must select their five best boys and five best girls for each event. Here are the times in seconds for 100 metres sprint for the two best classes. 13.1 16.5 14.3 15.4 16.4 13.8 12.8 15.9 13.4 15.3 12.2 15.2 12.9 14.7 12.0 14.9 11.5 10C boys 10C girls 10B boys 10B girls The data provided is taken from Year 10 in 2004 at St Clement Danes School, Hertfordshire. Times have been rounded to the nearest tenth. The discussion should involve looking at boys’ and girls’ scores separately and together. The next three slides show how to calculate the means. Which class should win and why?

The mean The mean is the most commonly used average.
To calculate the mean of a set of values we add together the values and divide by the total number of values. Mean = Sum of values Number of values For example, the mean time for Class 10B girls is: 5 = 73.6 5 = 14.72

Based on these results, who should win?
The mean Calculate the mean times for the other three groups. mean time 10C boys 10C girls 10B boys 10B girls 14.72 13.18 15.78 12.64 Now calculate means for Class 10B and Class 10C (with girls and boys combined). mean time Class 10C Class 10B Discuss an appropriate level of accuracy. Pupils would benefit from a print out of the data. 13.95 14.21 Based on these results, who should win?

Calculating the mean Pupils could make their own clouds too. Working from the mean and the number of items, they should generate their own list of numbers to fit.

Calculating a missing data item
Pupils could hide one of the numbers in their own cloud, and have a partner work it out.

Outliers and their effect on the mean
The school athletics team take part in an inter-schools competition. James’s shot results (in metres) are below. Discuss: What is the mean throw? Is this a fair representation of James’s ability? Explain. What would be a fair way for the competition to operate? The mean is ÷ 7 = 8.81 m The best throw could be used, or the worst score removed. A data item that is significantly higher or lower than the other items is called an outlier. Outliers affect the mean, by reducing or increasing it.

Outliers and their effect on the mean
Here are some 1500 metre race results in minutes. Discuss: Are there any outliers? Will the mean be increased or reduced by the outlier? Calculate the mean with the outlier. Now calculate the mean without the outlier. How much does it change? The mean with the outlier is 59.1 ÷ 9 = 6.57 minutes The mean without the outlier is ÷ 8 =6.06 minutes Another example of when an outlier would occur is an experiment on reaction time, where an anomalous result (e.g. if the participant’s hand slips) will be very large compared with the rest of the results. It may be appropriate in research or experiments to remove an outlier before carrying out analysis of results.

D2.3 Calculating the mean from frequency tables
Contents D2 Averages and range A D2.1 The mode A D2.2 The mean A D2.3 Calculating the mean from frequency tables A D2.4 The median A D2.5 Comparing data

Calculating the mean from a frequency table
Here are the results of a survey carried out among university students. If you were to write out the whole list of results, what would it look like? Numbers of sports played Frequency 20 1 17 2 15 3 10 4 9 5 6 Pupils may benefit from calculating the mean from the list before they are ready to appreciate the multiplication method. What do you think the mean will be?

Calculating the mean from a frequency table
2 6 3 9 10 15 17 20 Frequency Number of sports × frequency 4 5 1 Numbers of sports played 0 × 20 = 0 1 × 17 = 17 2 × 15 = 30 3 × 10 = 30 4 × 9 = 36 5 × 3 = 15 Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of discrete data. 6 × 2 = 12 TOTAL 76 140 Mean = 140 ÷ 76 = 2 sports (to the nearest whole)

Grouped data Here are the Year Ten boys’ javelin scores.
Javelin distances in metres Frequency 5 ≤ d < 10 1 10 ≤ d < 15 8 15 ≤ d < 20 12 20 ≤ d < 25 10 25 ≤ d < 30 3 30 ≤ d < 35 35 ≤ d < 40 36 How could you calculate the mean from this data? How is the data different from the previous examples you have calculated with? The data has been grouped. Because the data is grouped, we do not know individual scores. It is not possible to add up the scores.

Midpoints Javelin distances in metres Frequency 5 ≤ d < 10 1
8 15 ≤ d < 20 12 20 ≤ d < 25 10 25 ≤ d < 30 3 30 ≤ d < 35 35 ≤ d < 40 It is possible to find an estimate for the mean. This is done by finding the midpoint of each group. To find the midpoint of the group ≤ d < 15: = 25 25 ÷ 2 = The other midpoints are displayed on the next page. Point out the link between the midpoint and the median/ mean. Discuss the fact that it is likely that the scores within a group are evenly distributed i.e. half above and half below the midpoint. This is the best assumption to make, although it is obviously not always true. (The greater the data set, the more likely this is to be the case.) Some pupils may point out that 15 is not included in the group 10 ≤ d < 15; however, since it can get very close to 15 (e.g ) this will make no difference to an estimated mean. 12.5 m Find the midpoints of the other groups.

Estimating the mean from grouped data
1 35 ≤ d < 40 3 10 12 8 Frequency Midpoint 30 ≤ d < 35 Frequency × midpoint 25 ≤ d < 30 20 ≤ d < 25 15 ≤ d < 20 10 ≤ d < 15 5 ≤ d < 10 Javelin distances in metres 7.5 1 × 7.5 = 7.5 12.5 8 × 12.5 = 100 17.5 12 × 17.5 = 210 22.5 10 × 22.5 = 225 27.5 3 × 27.5 = 82.5 32.5 1 × 32.5 = 32.5 Ask pupils to estimate the mean first. Discuss a suitable level of accuracy for rounding off in the context of continuous data. 37.5 1 × 37.5 = 37.5 TOTAL 36 695 Estimated mean = 695 ÷ 36 = 19.3 m (to 1 d.p.)

How accurate is the estimated mean?
Here are the javelin distances thrown by Year 10 before the data was grouped. 35.00 31.05 28.89 25.60 25.33 24.11 23.50 21.82 21.78 21.77 21.60 21.00 20.70 20.20 20.00 19.50 18.82 17.35 17.31 16.64 15.79 15.75 15.69 15.52 15.25 15.00 14.50 12.80 12.50 12.00 11.85 10.00 9.50 Work out the mean from the original data above and compare it with the estimated mean found from the grouped data. Emphasise that although the estimated mean can be quite accurate it is preferable to use the original data if this is available. This can be particularly relevant in GCSE coursework. The estimated mean is 19.3 metres (to 1 d.p.). The actual mean is 18.7 metres (to 1 d.p.). How accurate was the estimated mean?

D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A
D2.3 Calculating the mean from frequency tables A D2.4 The median A D2.5 Comparing data

Calculate the median of the 1500 m results.
The median is the middle number when all numbers are in order. Calculate the median of the 1500 m results. Write the results in order and find the middle value: The median is not affected by the value of the outlier. Why is this a more appropriate average than the mean for these results?

Choosing the most appropriate average
What are the mean and median for these sets of attendance figures for three lunchtime activities? 23 22 21 20 19 18 17 Choir Drama club 29 28 25 Orchestra Explain your answers. For the drama club and choir, the means and medians are all 20. In the choir, the numbers are evenly distributed around the middle. For the orchestra, the mean is 157 ÷ 7 = 22.4 and the median is 20. The mean is pushed up by the high figures at the end. The mean is better because it takes account of all the data; there is not just one outlier so it is appropriate to include all data. The orchestra is clearly the most popular on average, but the median does not show this. To decide which of the three activities is the most popular, which average is a better one to use? Why?

Outliers and the median and mean
This activity illustrates the way that outliers affect the mean but do not affect the median. For each set of data, say which would be the most appropriate average and why.

When there are two middle numbers
Here are 10B girls’ long jump results in metres. How could you work out the median jump? If there are two middle numbers, you need to find what is halfway between them. If the numbers are far apart, a quick way to find the middle of those two numbers is to add them up and divide by two. The median is 3.06 m. 2.80 m m = 6.12 m 6.12 m ÷ 2 = 3.06 m

Finding halfway between two numbers
Reset to generate different examples. Sometimes it will be the midway number that needs to be found; but on other occasions, this will be given and one of the two endpoints will be hidden.

One or two middle numbers?
If there are 9 numbers in a list, will there be 1 or 2 middle numbers? If there are 10 numbers in a list, will there be 1 or 2 middle numbers? If there is an even number of numbers in a list, there will be two middle numbers. Discuss what the median is in each case. If there is an odd number of numbers in a list, there will be one middle number.

When there are two middle numbers
To find out where a middle number in a very long list, call the number of numbers n. Then the middle number is then (n + 1) ÷ 2 For example, There are 100 numbers in a list. Where is the median? 101 ÷ 2 = 50.5th number in the list (halfway between the 50th and the 51st). Pupils should first predict whether there will be one or two middle numbers. They should see a connection between the fact that an odd number divided by 2 will always give an answer ending in .5 so that there will be two medians. Remind pupils that these numbers are not the medians - they are just the positions of the medians in the list. There are 37 numbers in a list. Where is the median? 38 ÷ 2 = 19th number in the list.

Where is the median? Discuss how to find the median; press reset to use a new data set. Sometimes there will be an odd number and sometimes an even number of items.

D2 Averages and range Contents A D2.1 The mode A D2.2 The mean A
D2.3 Calculating the mean from frequency tables A D2.4 The median A D2.5 Comparing data

The range Here are the high jump scores for two girls in metres.
Joanna 1.62 1.41 1.35 1.20 1.15 Kirsty 1.59 1.45 1.30 Find the range for each girl’s results and use this to find out who is consistently better. Joanna’s range = 1.62 – 1.15 = 0.47 Discuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent. Kirsty’s range = 1.59 – 1.30 = 0.29

The range The highest and lowest scores can be useful in deciding who is more consistent. The lowest score subtracted from the highest score is called the range. Remember that the range is not an average, but a measure of spread. If the scores are spread out then the range will be higher and the scores less consistent. If the scores are close together then the range will be lower and the scores more consistent.

The range Joanna 1.62 1.41 1.35 1.20 1.15 Kirsty 1.59 1.45 1.30
Calculate the mean and the range for each girl. Joanna Kirsty Mean Range 1.35 m 1.41 m 0.47 m 0.29 m Discuss the consistency of the two jumpers: Joanna has the highest score but Kirsty is more consistent. Kirsty also has a higher mean. Performing well under pressure is a very important skill for athletes, and so Kirsty may be a better choice. You might also want to use words like “reliable” in this context. Use these results to decide which one you would enter into the athletics competition and why.

Calculating the mean, median and range
Each time the activity is reset a new set of data is generated. Calculate the mean, median and range. This could be done as a competition in teams or with mini whiteboards. If required, ask a volunteer to come to the board and use the pen tool to write the given data set in order first.

Comparing sets of data Here is a summary of Chris and Rob’s performance in the 200 metres over a season. They each ran 10 races. Chris Rob Mean 24.8 seconds 25.0 seconds Range 1.4 seconds 0.9 seconds Which of these conclusions are correct? Robert is more reliable. Robert is better because his mean is higher. The first and the last statements are correct. The data on the next page will illustrate why the fourth statement is not always correct. The second statement is not correct because a higher mean means he is slower. The third statement is incorrect because a high range means he is inconsistent. Chris is better because his range is higher. Chris must have run a better time for his quickest race. On average, Chris is faster but he is less consistent.

Comparing sets of data Chris Rob Mean 24.8 seconds 25.0 seconds Range
24.4 24.5 24.6 24.9 25.0 25.1 25.8 24.3 25.2 Here is the original data for Chris and Rob. Use the summary table above to decide which data set is Chris’s and which is Rob’s? The first set of data is Chris’s and the second Rob’s. The data illustrates why the fourth statement is not always correct: Chris has a higher mean, but Rob has the best time of 24.3 seconds. Who has the best time? Who has the worst time?

Comparing hurdles scores
Here are the top eleven hurdles scores in seconds for Year 9 and Year 10. Year 9 12.1 14.0 15.3 15.4 15.6 15.7 16.1 16.7 17.0 Year 10 12.3 13.7 15.5 15.6 15.9 16.0 16.1 17.1 22.9 Work out the mean and range. Year 9 Year 10 Mean Range 15.4 16.1 4.9 10.6 Which year group do you think is better and why? Discuss the fact that the extreme value 22.9 seconds significantly affects the mean and range for Year 10. Why might Year 10 feel the comparison is unfair?

Finding the interquartile range
The time of 22.9 seconds is an outlier. When there are outliers in the data, it is more appropriate to calculate the interquartile range. The interquartile range is the range of the middle half of the data. The lower quartile is the data value that is quarter of the way along the list. The upper quartile is the data value that is three quarters of the way along the list. interquartile range = upper quartile – lower quartile

Locating the upper and lower quartiles
There are 11 times in each list. Year 9 12.1 14.0 15.3 15.4 15.6 15.7 16.1 16.7 17.0 Year 10 12.3 13.7 15.5 15.6 15.9 16.0 16.1 17.1 22.9 Where is the median in each list? Where is the lower quartile in each list? Where is the upper quartile in each list? Interquartile range for Year 9: Establish that is there are 11 data values the median will be the 6th value in the list leaving 5 values on either side. The lower and upper quartiles are in the middle of these 5 remaining values on either side. Note that the median, lower quartile and upper quartile will only be actual values in the data set when the number of values in the data set is (4n – 1), where n is a whole positive number. When the number of values in the data set is not (4n – 1), where n is a whole positive number, it is acceptable to use an approximation of the upper and lower quartiles using the closest value. The interquartile range for Year 10 is smaller than the interquartile range for Year 9. Pupils could be asked to construct a box-and-whisker diagram to compare this data. Link: D4.5 Box-and-whisker diagrams. 16.1 – 15.3 = 0.8 Interquartile range for Year 10: 16.1 – 15.5 = 0.6

The location of quartiles in an ordered data set
When there are n values in an ordered data set: The lower quartile = n + 1 4 th value The median = n + 1 2 th value The upper quartile = 3(n + 1) 4 th value These median, lower quartile and upper quartile can also be estimated from a cumulative frequency graph. Links: D4.4 Using cumulative frequency graphs, D4.5 Box-and-whisker diagrams. The interquartile range = the upper quartile – the lower quartile

Finding the interquartile range
Use the activity to discuss how to find the interquartile range. Pressing the play button reveals how this is found. Resetting will produce a new data set. Link this activity to box-and-whisker diagrams by also finding the minimum value, the maximum value and the median and asking pupils to construct the corresponding box-and-whisker diagram. Links: D4.4 Using cumulative frequency graphs, D4.5 Box-and-whisker diagrams.

Review To review the work you have covered in this topic:
Write out the key words on cards. Shuffle the cards. Describe the word on each card to your partner. Your partner must guess the word. Do as many as you can in one minute, then swap over. 1) Play “Guess the word”. 2) Make up challenges involving sets of data for your partner, such as working out the mean. 3) Make a list of possible mistakes to avoid in this topic.