# D4 Moving averages and cumulative frequency

## Presentation on theme: "D4 Moving averages and cumulative frequency"— Presentation transcript:

D4 Moving averages and cumulative frequency
KS4 Mathematics D4 Moving averages and cumulative frequency

D4 Moving averages and cumulative frequency
Contents D4 Moving averages and cumulative frequency A D4.1 Moving averages A D4.2 Plotting moving averages A D4.3 Cumulative frequency A D4.4 Using cumulative frequency graphs A D4.5 Box-and-whisker diagrams

Stop complaining! Tabina’s friends claim that she is always complaining and decide to keep a record of how many times she is heard complaining every day for five weeks. These are the results: A print out of the data would be useful. Pupils could discuss this in pairs. Assume Monday is the first day of the seek. They should notice that there are some groups of seven than contain 0 that do not start on Monday, which could justify the prize being awarded for 7 0s in a row. They agree to give Tabina a prize if she can stop complaining for a whole week. Should she get a prize?

Is it fair to consider only Monday to Sunday?
Groups of seven There are lots of groups of seven days in the data. 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 Is it fair to consider only Monday to Sunday? Click in the box to begin. Each group of 7 being highlighted in turn to show that there are 29 groups of 7 altogether. This will emphasise the idea of a moving set of data which is used for the moving average. What if you included Sunday to Saturday, Tuesday to Monday, Wednesday to Tuesday and so on?

The moving average 1 2 3 6 5 4 We could calculate the mean for every group of seven. How could this help us decide whether Tabina should get a reward? How many of the means will be 0? What method would you use to calculate the means? There will be two means of 0. The means of each group of seven are collectively called a seven-point moving average.

Calculating a seven-point moving average
1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 1 2 3 6 5 4 The means (to 2 decimal places) for each of the 29 groups of 7 are as follows: 2.86 2.43 2.14 2.14 1.57 1.57 1.00 0.29 0.00 0.43 1.14 1.29 2.14 2.14 2.14 2.14 2.14 1.86 2.00 1.29 Pupils could work out some of the averages; perhaps the task could be divided up among the class. The average for each group of seven will be given for each group of seven, by clicking on the first item in the group. The group will be highlighted. 1.29 1.29 1.29 0.86 0.43 0.14 0.00 0.14 0.29 What can the moving average tell us about the general pattern of Tabina’s behaviour and whether she should win the prize?

Moving averages Use this activity to demonstrate how to find the seven-point moving average for various sets of data. Emphasise that the increase in the mean is much smaller than the jump from one number to the next in the raw data. In other words, the effect of an individual outlier is “smoothed out” in the means.

D4.2 Plotting moving averages
Contents D4 Moving averages and cumulative frequency A D4.1 Moving averages A D4.2 Plotting moving averages A D4.3 Cumulative frequency A D4.4 Using cumulative frequency graphs A D4.5 Box-and-whisker diagrams

A graph showing number of complaints each day
This graph shows the number of times Tabina complains each day. Days Number of complaints 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 This graph does not show the general trend which is to complain less as time goes on. The random fluctuations (the causes of which pupils may enjoy speculating about) mask the overall improvement in behaviour. The moving average on the next slide gives a better idea of this. The lines on the graph are dotted lines because the data is discrete (Tabina’s complaints are only counted up once a day). How well does this graph illustrate the general trend in Tabina’s behaviour?

A graph showing number of complaints each day
A line graph that shows how a value changes over time is called a time series. To smooth out the fluctuations in this time series we can plot the moving average: Days Number of complaints 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Ask pupils to comment on the difference between the green and orange lines. The random fluctuations are smoothed out by the moving average, since any individual outlier within a group of 7 has less impact on the average than it does on its own. Ask pupils how the graph relates back to the original question of whether Tabina is entitled to a prize: each group of 7 0s is indicated by a mean of 0, which occurs twice. Discuss where each mean is plotted: ask why the first mean is not at the beginning of the graph.

Plotting moving averages on a time series graph
When we plot the moving average, each mean is plotted halfway along the group that it represents. For our seven-point moving average we would have: Method 29th – 35th 3rd – 9th 2nd – 8th 1st – 7th Position of mean on graph Range (1 + 7) ÷ 2 4 (2 + 8) ÷ 2 5 (3 + 9) ÷ 2 6 The mean is always plotted halfway along the group. To calculate its position therefore involves calculating the midpoint. ( ) ÷ 2 32

Comparing sets of data Here are the attendance records for two hip hop dance classes of students over ten weeks. Class A 28 30 27 29 21 Class B 26 29 28 30 27 25 Remind pupils that time always goes on the horizontal axis. They could start the vertical axis at 0, or higher. Discuss the way this will affect the apparent size of the fluctuations. What do pupils think the reason for the dip in Class A in the fourth week could be? Draw line graphs for each class to represent the changes in attendance.

Calculating a five-point moving average
We can smooth out the fluctuations for each graph by calculating a five-point moving average. Class A 28 30 27 29 21 28 30 27 29 21 28 30 27 29 21 28 30 27 29 21 28 30 27 29 21 28 30 27 29 21 28 30 27 29 21 Means for class A 27.0 26.8 27.0 26.8 28.6 28.4 Class B 26 29 28 30 27 25 26 29 28 30 27 25 26 29 28 30 27 25 26 29 28 30 27 25 26 29 28 30 27 25 26 29 28 30 27 25 26 29 28 30 27 25 Means for class B 28.0 27.6 28.2 28.6 28.4 27.6

Plotting a five-point moving average
Each mean is then plotted halfway along the group that it represents. For a five-point moving average we have: Method 3rd – 7th 2nd – 6th 1st – 5th Position of mean on graph Range (1 + 7) ÷ 2 4 (2 + 8) ÷ 2 5 These three examples should be sufficient for pupils to plot their own time series graph. (3 + 9) ÷ 2 6

Time series for class A Attendance Weeks 20 21 22 23 24 25 26 27 28 29
30 1 2 3 4 5 6 7 8 9 10 The graphs used begin their vertical scale at 20. This has the advantage of making the fluctuations clear, but could also be seen as a disadvantage since it distorts the size of the fluctuations.

Five-point moving average for class A
Attendance Weeks 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 Reinforce the purpose of the moving average. Point out the effect of the fourth week on the original graph and on the means. How many means does it affect? Does it affect the general trend?

Time series for class B Attendance Weeks 20 21 22 23 24 25 26 27 28 29
30 1 2 3 4 5 6 7 8 9 10

Five-point moving average for class B
Attendance Weeks 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10

Plotting the means for other moving averages
We can find the positions of other moving averages as follows: 5 6 7 Method 8 4 3 Position of first mean on graph Size of moving average (3 + 1) ÷ 2 2 (4 + 1) ÷ 2 2.5 (5 + 1) ÷ 2 3 (6 + 1) ÷ 2 3.5 Ask pupils to work out what is halfway between 1 to 3, 1 to 4 etc. They should be able to generate the last column; then you can guide them to find the method in the middle column. Ask pupils if they notice any patterns in the table: the odd numbers produce integer numbers, whereas the even numbers do not. (This can be related back to the median.) The numbers in the end column are really relative positions to the numbers along the horizontal axis. For example, if the numbers were years (say 2000, 2001, etc), then the numbers in the end column would have to be adapted. (7 + 1) ÷ 2 4 (8 + 1) ÷ 2 4.5

D4.3 Cumulative frequency
Contents D4 Moving averages and cumulative frequency A D4.1 Moving averages A D4.2 Plotting moving averages A D4.3 Cumulative frequency A D4.4 Using cumulative frequency graphs A D4.5 Box-and-whisker diagrams

Choosing class intervals
You are going to record how long each member of your class can keep their eyes open without blinking. How could this information be recorded? What practical issues might arise? Time is an example of continuous data. You will have to decide how accurately to measure the times, The results will be continuous data. You will have to decide how accurately to measure the times. The nearest second is most appropriate, but intervals of 5 seconds would provide adequate information. It depends on what the data is used for. For example, medical research might require more accurate information. to the nearest tenth of a second? to the nearest second? to the nearest five seconds?

Keeping your eyes open You will also have to decide what size class intervals to use. When continuous data is grouped into class intervals it is important that no values are missed out and that there are no overlaps. For example, you may decide to use class intervals with a width of 5 seconds. If everyone keeps their eyes open for more than 10 seconds the first class interval would be more than 10 seconds, up to and including 15 seconds. Verify that there are no gaps or overlaps for the two class intervals given. This is usually written as 10 < t ≤ 15, where t is the time in seconds. The next class interval would be _________. 15 < t ≤ 20

Cumulative frequency graph of results
Enter the data. Press the chart icon to see the graph plotted.

Cumulative frequency Cumulative frequency is a running total. It is calculated by adding up the frequencies up to that point. Here are the results of 100 people holding their breath: Cumulative frequency 16 50 < t ≤ 55 11 55 < t ≤ 60 9 30 < t ≤ 35 12 35 < t ≤ 40 24 40 < t ≤ 45 28 45 < t ≤ 50 Time in seconds Frequency 9 0 < t ≤ 35 = 21 0 < t ≤ 40 = 45 0 < t ≤ 45 The right hand column represents the cumulative frequency. For example, the second category of 0 < t < 40 means 21 people held their breath for 40 seconds or less. Ask pupils for the cumulative frequencies and to explain what they mean e.g. “What does the cumulative frequency 45 mean?” Answer: “45 pupils held their breath for 45 seconds or less.” Ask “what percentage of the class can hold their breath for 45 seconds or less?” and “How many can hold their breath for 60 seconds?” Discuss how you could find an estimate for the mean and median from the grouped data; and the modal group. = 73 0 < t ≤ 50 = 89 0 < t ≤ 55 = 100 0 < t ≤ 60

Finding averages using cumulative frequency
100 people took part in the experiment. From the table, how could you find exact values or estimates for: the mean? the mode/ modal group? the median? the range? The mean would be an estimate; the median could only be identified as being within a group; the modal group could be found; the range could be estimated e.g. by taking the midpoints of the lowest and highest groups. None of these is very satisfactory, which demonstrates the purpose of the cumulative frequency graph. To find a more accurate value for the median, a cumulative frequency graph can be used.

D4.4 Using cumulative frequency graphs
Contents D4 Moving averages and cumulative frequency A D4.1 Moving averages A D4.2 Plotting moving averages A D4.3 Cumulative frequency A D4.4 Using cumulative frequency graphs A D4.5 Box-and-whisker diagrams

Cumulative frequency graphs
Here is the cumulative frequency table for 100 people holding their breath: Time in seconds Cumulative frequency 0 < t ≤ 35 9 0 < t ≤ 40 21 0 < t ≤ 45 45 0 < t ≤ 50 66 0 < t ≤ 55 85 0 < t ≤ 60 100 We can plot a cumulative frequency graph as follows:

Plotting a cumulative frequency graph
Time in seconds Cumulative frequency 30 35 40 45 50 55 60 10 20 70 80 90 100 The upper boundary for each class interval is plotted against its cumulative frequency. A smooth curve is then drawn through the points. We can use the graph to estimate the median by finding the time for the 50th person. Note that the first point that is plotted is the lower boundary of the first class interval which has a cumulative frequency of 0. Point out the characteristic S-shape of the cumulative frequency curve. Ask pupils to use the graph to estimate the number of seconds the middle person held their breath for. This is technically the ( ) ÷ 2th person, but since the graph is not accurate enough to measure this, it is appropriate to use the 50th person. This gives us a median time of 47 seconds.

The interquartile range
Remember, the range is a measure of spread. It is the difference between the highest value and the lowest value. When the range is affected by outliers it is often more appropriate to use the interquartile range. The interquartile range is the range of the middle 50% of the data. The lower quartile is the data item ¼ of the way along the list. Link: D2.5 Comparing data. The upper quartile is the data item ¾ of the way along the list. interquartile range = upper quartile – lower quartile

Finding the interquartile range
Time in seconds Cumulative frequency 30 35 40 45 50 55 60 10 20 70 80 90 100 The cumulative frequency graph can be used to locate the upper and lower quartiles and so find the interquartile range. The lower quartile is the time of the 25th person. 42 seconds The upper quartile is the time of the 75th person. 51 seconds The lower quartile for 100 people is technically the ( ) ÷ 4th person and the upper quartile the 3( ) ÷ 4th person. The graph is not accurate enough to measure this and so it is appropriate to use the 25th and 75th person. Remind pupils this represents the range of the middle half of the data. Compare this with the range. Link: D2.5 Comparing data. The interquartile range is the difference between these two values. 51 – 42 = 9 seconds

D4.5 Box-and-whisker diagrams
Contents D4 Moving averages and cumulative frequency A D4.1 Moving averages A D4.2 Plotting moving averages A D4.3 Cumulative frequency A D4.4 Using cumulative frequency graphs A D4.5 Box-and-whisker diagrams

A box-and-whisker diagram
A box-and-whisker diagram, or boxplot, can be used to illustrate the spread of the data in a given distribution using the median, the lower quartile and the upper quartile. These values can be found from a cumulative frequency graph. Time in seconds Cumulative frequency 30 35 40 45 50 55 60 10 20 70 80 90 100 For example, for this cumulative frequency graph showing the results of 100 people holding their breath, Minimum value = 30 The minimum and maximum values are not available from the grouped data. The lower and upper bounds from the lowest and highest groups respectively have been used here. Lower quartile = 42 Median = 47 Upper quartile = 51 Maximum value = 60

A box-and-whisker diagram
The corresponding box-and-whisker diagram is as follows: 30 Minimum value 47 Median 60 Maximum value 42 Lower quartile 51 Upper quartile Note that the boxes are drawn to scale, so that the relative positions of the lower quartile, median and upper quartile can be seen clearly. Discuss the position of the median: it is not halfway within the interquartile range, as it was within the full range. It is closer to the upper quartile than the lower quartile. The interquartile range is clearly much smaller than the full range. The class could draw a box-and-whisker diagram for the data collected on slide 25.

In which position in the list would the median lap time be?
Lap times James takes part in karting competitions and his Dad records his lap times on a spreadsheet. One of the karting tracks is at Shenington. In 2004, 378 of James’ lap times were recorded. The track is 1108 metres long. James’ fastest time in a race was 51.8 seconds. In which position in the list would the median lap time be? Sometimes the number of data items does not divide equally into 4, and the position of the median, lower quartile and upper quartile have to be rounded. This should be discussed. The median would be 379/2 = = 190th The lower quartile would be 379/4 = = 95th The upper quartile would be 379/4 × 3 = = 284th. There are 378 lap times and so the median lap time will be the 2 th value ≈ 190th value

Lap times In which position in the list would the lower quartile be?
There are 378 lap times and so the lower quartile will be the 4 th value ≈ 95th value In which position in the list would the upper quartile be? Sometimes the number of data items does not divide equally into 4, and the position of the median, lower quartile and upper quartile have to be rounded. This should be discussed. The median would be 379/2 = = 190th The lower quartile would be 379/4 = = 95th The upper quartile would be 379/4 × 3 = = 284th. There are 378 lap times and so the upper quartile will be the 4 th value ≈ 3 × 284th value

Lap times at Shenington karting circuit
James’ lap times are displayed in the following cumulative frequency graph. Lap times in seconds Cumulative frequency 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 50 100 150 200 250 300 350 400 This data was collected for James Peace ( during 2004 at the Shenington karting circuit. The data has been grouped in two-second intervals. Point out that faster lap times are smaller, ie 52 is his best lap time (to the nearest second) and 92 his worst. Most of his lap times are between 52 and 57 seconds. The graph tails off at the end because there are fewer slower lap times. Discuss what might cause these e.g. the weather conditions. Pupils would benefit from a print out of the graph. Discuss appropriate levels of accuracy for reading from the graph. It is not possible to read more accurately than the nearest 0.5 for the lap times, and the nearest 10 for the cumulative frequency.

Box and whisker plot for James’ race times
Minimum value Maximum value Lower quartile Median Upper quartile Discuss the position of the median within the data and within the interquartile range. It is much closer to the faster times (since the data is skewed). The interquartile range is clearly much smaller than the full range, and there are a minority of lap times that distort the data overall, demonstrating the usefulness of the interquartile range and the advantage of the median over the mean. 52 54 58 91 53 What conclusions can you draw about James’ performance?

Comparing sets of data Here are box-and-whisker diagrams representing James’ lap times and Shabnum’s lap times. 52 53 54 58 91 James’ lap times 52 60 54 65 86 Shabnum’s lap times Both have the same fastest lap time, but James has a faster median. Shabnum’s interquartile range shows a greater spread of times but she has a smaller range overall. Who is better and why?