Presentation on theme: "S1: Chapter 4 Representation of Data"— Presentation transcript:
1S1: Chapter 4 Representation of Data Dr J FrostLast modified: 25th September 2014
2Stem and Leaf recapPut the following measurements into a stem and leaf diagram:?123454(1)(4)(9)(12)Key:2 | 1 means 2.1It may be quicker to first draw the stem and leaf without the ordering, before then ordering each row.Now find:?𝑀𝑜𝑑𝑒=4.7?𝐿𝑜𝑤𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7?𝑀𝑒𝑑𝑖𝑎𝑛=4.05?
3Back-to-Back Stem and Leaf recap GirlsBoys68?GirlsBoys45678961The data above shows the pulse rate of boys and girls in a school.Comment on the results.The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’.Key: 0|4|6Means 40 for girls and 46 for boys.?
4Box Plot recapBox Plots allow us to visually represent the distribution of the data.MinimumLower QuartileMedianUpper QuartileMaximum315172227SketchSketchSketchSketchSketchrangeIQRHow is the IQR represented in this diagram?How is the range represented in this diagram?SketchSketch
5Box Plots recapSketch a box plot to represent the given weights of cats:5lb, 6lb, 7.5lb, 8lb, 8lb, 9lb, 12lb, 14lb, 20lbMinimumMaximumMedianLower QuartileUpper Quartile52087.512?????Sketch
6OutliersAn outlier is: an extreme value.?Outliers beyond this pointMore specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles.(But you will be told in the exam if the rule differs from this)
7Outliers We can display outliers as crosses on a box plot. But if we have one, how do we display the marks for the minimum/maximum?Maximum point is not an outlier, so remains unchanged.But we have points that are outliers here. This mark becomes the ‘outlier boundary’, rather than the minimum.
10Comparing Box PlotsBox Plot comparing house prices of Croydon and Kingston-upon-Thames.CroydonKingston£100k £150k £200k £250k £300k £350k £400k £450k“Compare the prices of houses in Croydon with those in Kingston”. (2 marks)For 1 mark, one of:In interquartile range of house prices in Kingston is greater than Croydon.The range of house prices in Kingston is greater than Croydon.i.e. Something spread related.For 1 mark:The median house price in Kingston was greater than that in Croydon.i.e. Compare some measure of location (could be minimum, lower quartile, etc.)??
11Bar Charts vs Histograms For continuous data.Data divided into (potentially uneven) intervals.[GCSE definition] Frequency given by area of bars.*No gaps between bars.Bar ChartsFor discrete data.Frequency given by height of bars.?Use this as a reason whenever you’re asked to justify use of a histogram.???Frequency DensityFrequency1.0m m m m mHeightShoe Size* Not actually true. We’ll correct this in a sec.
12Bar Charts vs Histograms Still using the ‘incorrect’ GCSE formula:Weight (w kg)FrequencyFrequency Density0 < w ≤ 1040410 < w ≤ 1561.215 < w ≤ 35522.635 < w ≤ 45101??Freq?F.D.Width?Frequency = 40?54321Frequency = 15?Frequency = 25?Frequency DensityFrequency = 30?Height (m)
13i.e. 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦=𝑘×𝐴𝑟𝑒𝑎 𝑜𝑓 𝑏𝑎𝑟 Area = frequency?The area of each bar in fact isn’t necessarily equal to the frequency.Actually:𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚∝𝑨𝒓𝒆𝒂 𝒐𝒇 𝑩𝒂𝒓i.e. 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦=𝑘×𝐴𝑟𝑒𝑎 𝑜𝑓 𝑏𝑎𝑟Similarly:𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒅𝒆𝒏𝒔𝒊𝒕𝒚∝ 𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝑪𝒍𝒂𝒔𝒔 𝑾𝒊𝒅𝒕𝒉However, we often let 𝑘=1, so that that the ∝ becomes an =, as we were allowed to assume at GCSE.
14The key to almost every histogram question… …This diagram!×𝑘AreaFrequencyFor a given histogram, there’s some scaling to get from an area (whether the total area of the area of a particular bar) to the corresponding frequency.Once you’ve worked out this scaling, any subsequent areas you calculate can be converted to frequencies.
15Area = frequency?There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s.54321We first find what area represents the total frequency.Total area = = 24?Frequency DensityArea Freq×Then use this scaling along with the desired area.Area=4×1.591218?Area Freq×Time (s)
16Frequency Density = Frequency ÷ Class width? Weight (to nearest kg)Frequency1-243-637-9Note the gaps!We can use the complete set of information in the first row combined with the bar to again work out the correct ‘scaling’.??54321Frequency DensityTime (s)
17May 2012A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks)We can make the frequency density scale what we like.M1 A1: Determine what one small square or one large square is worth.7654321(i.e. work out 𝑎𝑟𝑒𝑎→𝑓𝑟𝑒𝑞 scaling)?Area Freq×M1 A1: Use this to find number of cars travelling >35mph.?Area Freq×
18May 2012A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.(b) Estimate the value of the mean speed of the cars in the sample. (3 marks)M1 M1: Use histogram to construct sum of speeds.?30× ×25+… 450A1 Correct value?=28.8Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.
24SkewSkew gives a measure of whether the values are more spread out above the median or below the median.modemodemedianmedianmeanmeanFrequencyFrequencyHeightWeightSketch ModeSketch MedianSketch MeanSketch ModeSketch MedianSketch MeanWe say this distribution has positive skew.?We say this distribution has negative skew.?(To remember, think that the ‘tail’ points in the positive direction)
25SkewRemember, think what direction the ‘tail’ is likely to point.DistributionSkew?Salaries on the UK.High salaries drag mean up.So positive skew.Mean > Median??IQA symmetrical distribution,i.e. no skew.Mean = Median??Heights of people in the UKWill probably be a nice ‘bell curve’.i.e. No skew.Mean = MedianThe way to remember which way the mean and median go around: the mean is ‘dragged up’ by a tail in the positive direction, so mean > median for positive skew. Picture salaries and you’ll never forget!??Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew.Mean < MedianAge of retirement?
26Exam Question Negative skew ? because mean < median ? In the previous parts of a question you’ve calculated that the mean mark of students in a test was 𝑚𝑒𝑎𝑛=55.48 and 𝑚𝑒𝑑𝑖𝑎𝑛=56.(d) Describe the skewness of the marks of the students, giving a reason for your answer. (2)Negative skew1st mark?because mean < median2nd mark?
27Skew Positive skew ? Negative skew ? No skew ? Given the quartiles and median, how would you work out whether the distribution had positive or negative skew?
29Calculating Skew 3(mean – median) standard deviation One measure of skew can be calculated using the following formula:(Important Note: this will be given to you in the exam if required)3(mean – median)standard deviationWhen mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected.Find the skew of the following teachers’ annual salaries:£3 £ £4 £7 £100Mean = £23.50?Median = £4?Standard Deviation = £38.28?Skew = 1.53?
31RevisionStem and leaf diagrams:Can you construct one, and write the appropriate key?Can you calculate mode, mean, median and quartiles?Can you assess skewness by using these above values?Back-to-back stem and leaf diagrams:Can you construct one with appropriate key?Can you compare the data on each side?123454(1)(4)(9)(12)Key:2 | 1 means 2.1???𝑀𝑜𝑑𝑒=4.7𝐿𝑜𝑤𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7?𝑀𝑒𝑑𝑖𝑎𝑛=4.05?Type of skew:𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 Reason: 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1??
32RevisionGirlsBoysNotice the values go outwards from the centre.45678961Key: 0|4|6Means 40 for girls and 46 for boys.??The data above shows the pulse rate of boys and girls in a school.Comment on the results.Boy’s pulse rate tends to be lower than girls’.?
33Revision Histograms Can you: Appreciate that the frequency density scale doesn’t matter. This is why frequency is only proportional to area, and not equal to it.You often need to identify the scaling 𝑨𝒓𝒆𝒂 ×𝒌 𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚. You might only be given the total frequency (in which case you need to find the total area of the histogram to find 𝑘). But if you know the frequency associated with a particular bar, just find the area of that single bar.If you don’t care about the scaling, then 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐷𝑒𝑛𝑠𝑖𝑡𝑦= 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝐶𝑙𝑎𝑠𝑠 𝑊𝑖𝑑𝑡ℎBe incredibly careful about class widths (i.e. widths of boxes). If the class interval in the frequency table was 20−25 with gaps, then you’d draw 19.5−25.5 on the histogram, and use 6 as the width of the box.If you want to find the quartiles/median/mean, you need to first construct a grouped frequency table using the histogram.When asked to find the number of people with values in a certain range (e.g. with times between 10 and 15s) and it crosses multiple ranges/bars, it’s easier to use the frequency table you’ve constructed from the histogram. Use linear interpolation where necessary.
35RevisionGiven that an outlier is a value 1.5×𝐼𝑄𝑅 outside the lower and upper quartiles…Smallest valuesLargest valuesLower QuartileMedianUpper Quartile0, 321, 2781014?Smallest valuesLargest valuesLower QuartileMedianUpper Quartile3, 720, 25, 26121316?
36𝑆𝑘𝑒𝑤= 3 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 RevisionSkewnessYou can determine skewness in three ways:Comparing quartiles: When 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 , the width of the right box in the box plot is wider, so it’s positive skew. If a box plot is drawn, it should be immediately obvious!Comparing mean/median: When 𝑚𝑒𝑎𝑛>𝑚𝑒𝑑𝑖𝑎𝑛, large values have dragged up the mean, so there’s a tail in the positive direction, and thus the skew is positive.Looking at the shape of the distribution. If there’s a ‘positive tail’, the skew is positive.When asked to justify your answer for skewness, you’re expected to put either something like “ 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 ” or "𝑚𝑒𝑎𝑛>𝑚𝑒𝑑𝑖𝑎𝑛“.You will always be given a formula if you have to calculate a value for skew. But for all formulae, 0 means no skew (i.e. a “symmetric distribution”), >0 means positive skew and <0 means negative skew.Find the skew of the following teachers’ annual salaries:£3 £ £4 £7 £100Mean = £23.50?Median = £4?Standard Deviation = £38.28?𝑆𝑘𝑒𝑤= 3 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛Skew = 1.53?