Presentation on theme: "D4 Moving averages and cumulative frequency"— Presentation transcript:
1D4 Moving averages and cumulative frequency KS4 MathematicsD4 Moving averages and cumulative frequency
2D4 Moving averages and cumulative frequency ContentsD4 Moving averages and cumulative frequencyAD4.1 Moving averagesAD4.2 Plotting moving averagesAD4.3 Cumulative frequencyAD4.4 Using cumulative frequency graphsAD4.5 Box-and-whisker diagrams
3Stop complaining!Tabina’s friends claim that she is always complaining and decide to keep a record of how many times she is heard complaining every day for five weeks. These are the results:A print out of the data would be useful. Pupils could discuss this in pairs. Assume Monday is the first day of the seek. They should notice that there are some groups of seven than contain 0 that do not start on Monday, which could justify the prize being awarded for 7 0s in a row.They agree to give Tabina a prize if she can stop complaining for a whole week.Should she get a prize?
4Is it fair to consider only Monday to Sunday? Groups of sevenThere are lots of groups of seven days in the data.123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654Is it fair to consider only Monday to Sunday?Click in the box to begin. Each group of 7 being highlighted in turn to show that there are 29 groups of 7 altogether. This will emphasise the idea of a moving set of data which is used for the moving average.What if you included Sunday to Saturday, Tuesday to Monday, Wednesday to Tuesday and so on?
5The moving average123654We could calculate the mean for every group of seven.How could this help us decide whether Tabina should get a reward?How many of the means will be 0?What method would you use to calculate the means?There will be two means of 0.The means of each group of seven are collectively called a seven-point moving average.
6Calculating a seven-point moving average 123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654123654The means (to 2 decimal places) for each of the 29 groups of 7 are as follows:2.862.4188.8.131.521.571.000.290.000.431.141.2184.108.40.206.142.141.862.001.29Pupils could work out some of the averages; perhaps the task could be divided up among the class. The average for each group of seven will be given for each group of seven, by clicking on the first item in the group. The group will be highlighted.1.291.291.290.860.430.140.000.140.29What can the moving average tell us about the general pattern of Tabina’s behaviour and whether she should win the prize?
7Moving averagesUse this activity to demonstrate how to find the seven-point moving average for various sets of data.Emphasise that the increase in the mean is much smaller than the jump from one number to the next in the raw data. In other words, the effect of an individual outlier is “smoothed out” in the means.
8D4.2 Plotting moving averages ContentsD4 Moving averages and cumulative frequencyAD4.1 Moving averagesAD4.2 Plotting moving averagesAD4.3 Cumulative frequencyAD4.4 Using cumulative frequency graphsAD4.5 Box-and-whisker diagrams
9A graph showing number of complaints each day This graph shows the number of times Tabina complains each day.DaysNumber of complaints1234567891011121314151617181920212223242526272829303132333435This graph does not show the general trend which is to complain less as time goes on. The random fluctuations (the causes of which pupils may enjoy speculating about) mask the overall improvement in behaviour. The moving average on the next slide gives a better idea of this.The lines on the graph are dotted lines because the data is discrete (Tabina’s complaints are only counted up once a day).How well does this graph illustrate the general trend in Tabina’s behaviour?
10A graph showing number of complaints each day A line graph that shows how a value changes over time is called a time series.To smooth out the fluctuations in this time series we can plot the moving average:DaysNumber of complaints1234567891011121314151617181920212223242526272829303132333435Ask pupils to comment on the difference between the green and orange lines. The random fluctuations are smoothed out by the moving average, since any individual outlier within a group of 7 has less impact on the average than it does on its own.Ask pupils how the graph relates back to the original question of whether Tabina is entitled to a prize: each group of 7 0s is indicated by a mean of 0, which occurs twice.Discuss where each mean is plotted: ask why the first mean is not at the beginning of the graph.
11Plotting moving averages on a time series graph When we plot the moving average, each mean is plotted halfway along the group that it represents.For our seven-point moving average we would have:Method29th – 35th…3rd – 9th2nd – 8th1st – 7thPosition of mean on graphRange(1 + 7) ÷ 24(2 + 8) ÷ 25(3 + 9) ÷ 26The mean is always plotted halfway along the group. To calculate its position therefore involves calculating the midpoint.……( ) ÷ 232
12Comparing sets of dataHere are the attendance records for two hip hop dance classes of students over ten weeks.Class A2830272921Class B262928302725Remind pupils that time always goes on the horizontal axis. They could start the vertical axis at 0, or higher. Discuss the way this will affect the apparent size of the fluctuations. What do pupils think the reason for the dip in Class A in the fourth week could be?Draw line graphs for each class to represent the changes in attendance.
13Calculating a five-point moving average We can smooth out the fluctuations for each graph by calculating a five-point moving average.Class A2830272921283027292128302729212830272921283027292128302729212830272921Means for class A27.026.827.026.828.628.4Class B262928302725262928302725262928302725262928302725262928302725262928302725262928302725Means for class B28.027.628.228.628.427.6
14Plotting a five-point moving average Each mean is then plotted halfway along the group that it represents.For a five-point moving average we have:Method…3rd – 7th2nd – 6th1st – 5thPosition of mean on graphRange(1 + 7) ÷ 24(2 + 8) ÷ 25These three examples should be sufficient for pupils to plot their own time series graph.(3 + 9) ÷ 26……
15Time series for class A Attendance Weeks 20 21 22 23 24 25 26 27 28 29 3012345678910The graphs used begin their vertical scale at 20. This has the advantage of making the fluctuations clear, but could also be seen as a disadvantage since it distorts the size of the fluctuations.
16Five-point moving average for class A AttendanceWeeks202122232425262728293012345678910Reinforce the purpose of the moving average. Point out the effect of the fourth week on the original graph and on the means. How many means does it affect? Does it affect the general trend?
17Time series for class B Attendance Weeks 20 21 22 23 24 25 26 27 28 29 3012345678910
18Five-point moving average for class B AttendanceWeeks202122232425262728293012345678910
19Plotting the means for other moving averages We can find the positions of other moving averages as follows:567Method843Position of first mean on graphSize of moving average(3 + 1) ÷ 22(4 + 1) ÷ 22.5(5 + 1) ÷ 23(6 + 1) ÷ 23.5Ask pupils to work out what is halfway between 1 to 3, 1 to 4 etc. They should be able to generate the last column; then you can guide them to find the method in the middle column. Ask pupils if they notice any patterns in the table: the odd numbers produce integer numbers, whereas the even numbers do not. (This can be related back to the median.)The numbers in the end column are really relative positions to the numbers along the horizontal axis. For example, if the numbers were years (say 2000, 2001, etc), then the numbers in the end column would have to be adapted.(7 + 1) ÷ 24(8 + 1) ÷ 24.5
20D4.3 Cumulative frequency ContentsD4 Moving averages and cumulative frequencyAD4.1 Moving averagesAD4.2 Plotting moving averagesAD4.3 Cumulative frequencyAD4.4 Using cumulative frequency graphsAD4.5 Box-and-whisker diagrams
21Choosing class intervals You are going to record how long each member of your class can keep their eyes open without blinking.How could this information be recorded?What practical issues might arise?Time is an example of continuous data.You will have to decide how accurately to measure the times,The results will be continuous data. You will have to decide how accurately to measure the times. The nearest second is most appropriate, but intervals of 5 seconds would provide adequate information. It depends on what the data is used for. For example, medical research might require more accurate information.to the nearest tenth of a second?to the nearest second?to the nearest five seconds?
22Keeping your eyes openYou will also have to decide what size class intervals to use.When continuous data is grouped into class intervals it is important that no values are missed out and that there are no overlaps.For example, you may decide to use class intervals with a width of 5 seconds.If everyone keeps their eyes open for more than 10 seconds the first class interval would be more than 10 seconds, up to and including 15 seconds.Verify that there are no gaps or overlaps for the two class intervals given.This is usually written as 10 < t ≤ 15, where t is the time in seconds.The next class interval would be _________.15 < t ≤ 20
23Cumulative frequency graph of results Enter the data. Press the chart icon to see the graph plotted.
24Cumulative frequencyCumulative frequency is a running total. It is calculated by adding up the frequencies up to that point.Here are the results of 100 people holding their breath:Cumulative frequency1650 < t ≤ 551155 < t ≤ 60930 < t ≤ 351235 < t ≤ 402440 < t ≤ 452845 < t ≤ 50Time in secondsFrequency90 < t ≤ 35= 210 < t ≤ 40= 450 < t ≤ 45The right hand column represents the cumulative frequency. For example, the second category of 0 < t < 40 means 21 people held their breath for 40 seconds or less. Ask pupils for the cumulative frequencies and to explain what they mean e.g. “What does the cumulative frequency 45 mean?” Answer: “45 pupils held their breath for 45 seconds or less.” Ask “what percentage of the class can hold their breath for 45 seconds or less?” and “How many can hold their breath for 60 seconds?”Discuss how you could find an estimate for the mean and median from the grouped data; and the modal group.= 730 < t ≤ 50= 890 < t ≤ 55= 1000 < t ≤ 60
25Finding averages using cumulative frequency 100 people took part in the experiment.From the table, how could you find exact values or estimates for:the mean?the mode/ modal group?the median?the range?The mean would be an estimate; the median could only be identified as being within a group; the modal group could be found; the range could be estimated e.g. by taking the midpoints of the lowest and highest groups. None of these is very satisfactory, which demonstrates the purpose of the cumulative frequency graph.To find a more accurate value for the median, a cumulative frequency graph can be used.
26D4.4 Using cumulative frequency graphs ContentsD4 Moving averages and cumulative frequencyAD4.1 Moving averagesAD4.2 Plotting moving averagesAD4.3 Cumulative frequencyAD4.4 Using cumulative frequency graphsAD4.5 Box-and-whisker diagrams
27Cumulative frequency graphs Here is the cumulative frequency table for 100 people holding their breath:Time in secondsCumulative frequency0 < t ≤ 3590 < t ≤ 40210 < t ≤ 45450 < t ≤ 50660 < t ≤ 55850 < t ≤ 60100We can plot a cumulative frequency graph as follows:
28Plotting a cumulative frequency graph Time in secondsCumulative frequency303540455055601020708090100The upper boundary for each class interval is plotted against its cumulative frequency.A smooth curve is then drawn through the points.We can use the graph to estimate the median by finding the time for the 50th person.Note that the first point that is plotted is the lower boundary of the first class interval which has a cumulative frequency of 0.Point out the characteristic S-shape of the cumulative frequency curve.Ask pupils to use the graph to estimate the number of seconds the middle person held their breath for. This is technically the ( ) ÷ 2th person, but since the graph is not accurate enough to measure this, it is appropriate to use the 50th person.This gives us a median time of 47 seconds.
29The interquartile range Remember, the range is a measure of spread. It is the difference between the highest value and the lowest value.When the range is affected by outliers it is often more appropriate to use the interquartile range.The interquartile range is the range of the middle 50% of the data.The lower quartile is the data item ¼ of the way along the list.Link:D2.5 Comparing data.The upper quartile is the data item ¾ of the way along the list.interquartile range = upper quartile – lower quartile
30Finding the interquartile range Time in secondsCumulative frequency303540455055601020708090100The cumulative frequency graph can be used to locate the upper and lower quartiles and so find the interquartile range.The lower quartile is the time of the 25th person.42 secondsThe upper quartile is the time of the 75th person.51 secondsThe lower quartile for 100 people is technically the ( ) ÷ 4th person and the upper quartile the 3( ) ÷ 4th person. The graph is not accurate enough to measure this and so it is appropriate to use the 25th and 75th person.Remind pupils this represents the range of the middle half of the data. Compare this with the range.Link:D2.5 Comparing data.The interquartile range is the difference between these two values.51 – 42 = 9 seconds
31D4.5 Box-and-whisker diagrams ContentsD4 Moving averages and cumulative frequencyAD4.1 Moving averagesAD4.2 Plotting moving averagesAD4.3 Cumulative frequencyAD4.4 Using cumulative frequency graphsAD4.5 Box-and-whisker diagrams
32A box-and-whisker diagram A box-and-whisker diagram, or boxplot, can be used to illustrate the spread of the data in a given distribution using the median, the lower quartile and the upper quartile.These values can be found from a cumulative frequency graph.Time in secondsCumulative frequency303540455055601020708090100For example, for this cumulative frequency graph showing the results of 100 people holding their breath,Minimum value = 30The minimum and maximum values are not available from the grouped data. The lower and upper bounds from the lowest and highest groups respectively have been used here.Lower quartile = 42Median = 47Upper quartile = 51Maximum value = 60
33A box-and-whisker diagram The corresponding box-and-whisker diagram is as follows:30Minimum value47Median60Maximum value42Lower quartile51Upper quartileNote that the boxes are drawn to scale, so that the relative positions of the lower quartile, median and upper quartile can be seen clearly. Discuss the position of the median: it is not halfway within the interquartile range, as it was within the full range. It is closer to the upper quartile than the lower quartile. The interquartile range is clearly much smaller than the full range.The class could draw a box-and-whisker diagram for the data collected on slide 25.
34In which position in the list would the median lap time be? Lap timesJames takes part in karting competitions and his Dad records his lap times on a spreadsheet.One of the karting tracks is at Shenington. In 2004, 378 of James’ lap times were recorded.The track is 1108 metres long. James’ fastest time in a race was 51.8 seconds.In which position in the list would the median lap time be?Sometimes the number of data items does not divide equally into 4, and the position of the median, lower quartile and upper quartile have to be rounded. This should be discussed.The median would be 379/2 = = 190thThe lower quartile would be 379/4 = = 95thThe upper quartile would be 379/4 × 3 = = 284th.There are 378 lap times and so the median lap time will be the2thvalue ≈190th value
35Lap times In which position in the list would the lower quartile be? There are 378 lap times and so the lower quartile will be the4thvalue ≈95th valueIn which position in the list would the upper quartile be?Sometimes the number of data items does not divide equally into 4, and the position of the median, lower quartile and upper quartile have to be rounded. This should be discussed.The median would be 379/2 = = 190thThe lower quartile would be 379/4 = = 95thThe upper quartile would be 379/4 × 3 = = 284th.There are 378 lap times and so the upper quartile will be the4thvalue ≈3 ×284th value
36Lap times at Shenington karting circuit James’ lap times are displayed in the following cumulative frequency graph.Lap times in secondsCumulative frequency52545658606264666870727476788082848688909250100150200250300350400This data was collected for James Peace (www.54racing.com) during 2004 at the Shenington karting circuit. The data has been grouped in two-second intervals. Point out that faster lap times are smaller, ie 52 is his best lap time (to the nearest second) and 92 his worst. Most of his lap times are between 52 and 57 seconds. The graph tails off at the end because there are fewer slower lap times. Discuss what might cause these e.g. the weather conditions. Pupils would benefit from a print out of the graph. Discuss appropriate levels of accuracy for reading from the graph. It is not possible to read more accurately than the nearest 0.5 for the lap times, and the nearest 10 for the cumulative frequency.
37Box and whisker plot for James’ race times Minimum valueMaximum valueLower quartileMedianUpper quartileDiscuss the position of the median within the data and within the interquartile range. It is much closer to the faster times (since the data is skewed). The interquartile range is clearly much smaller than the full range, and there are a minority of lap times that distort the data overall, demonstrating the usefulness of the interquartile range and the advantage of the median over the mean.5254589153What conclusions can you draw about James’ performance?
38Comparing sets of dataHere are box-and-whisker diagrams representing James’ lap times and Shabnum’s lap times.5253545891James’ lap times5260546586Shabnum’s lap timesBoth have the same fastest lap time, but James has a faster median. Shabnum’s interquartile range shows a greater spread of times but she has a smaller range overall.Who is better and why?