Presentation is loading. Please wait.

Presentation is loading. Please wait.

S1: Chapter 4 Representation of Data

Similar presentations


Presentation on theme: "S1: Chapter 4 Representation of Data"— Presentation transcript:

1 S1: Chapter 4 Representation of Data
Dr J Frost Last modified: 20th September 2015

2 Overview We’ll look at 3 different ways of presenting data, as well as ways of analysing them (including ‘skew’). BOX PLOTS STEM AND LEAF *NEW since GCSE!* Outliers. *NEW since GCSE!* Back to back stem and leaf diagrams. HISTOGRAMS *NEW since GCSE!* Area is not necessarily equal to frequency.

3 Skew Skew gives a measure of whether the values are more spread out above the median or below the median. mode mode median median mean mean Frequency Frequency Height Weight Sketch Mode Sketch Median Sketch Mean Sketch Mode Sketch Median Sketch Mean We say this distribution has positive skew. ? We say this distribution has negative skew. ? (To remember, think that the ‘tail’ points in the positive direction)

4 Skew Remember, think what direction the ‘tail’ is likely to point. Distribution Skew ? Salaries on the UK. High salaries drag mean up. So positive skew. Mean > Median ? ? IQ A symmetrical distribution, i.e. no skew. Mean = Median ? ? Heights of people in the UK Will probably be a nice ‘bell curve’. i.e. No skew. Mean = Median The way to remember which way the mean and median go around: the mean is ‘dragged up’ by a tail in the positive direction, so mean > median for positive skew. Picture salaries and you’ll never forget! ? ? Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew. Mean < Median Age of retirement ?

5 Skew based on mean/median
Suppose for some data we had calculated that 𝑚𝑒𝑎𝑛=55.48 and 𝑚𝑒𝑑𝑖𝑎𝑛=56. Describe the skewness of the marks of the students, giving a reason for your answer. (2) Negative skew 1st mark ? because mean < median 2nd mark ? Bro Tip: If you ever forget which way the two go, just think of salaries! High values (i.e. a positive tail) drag up the mean but not the median. So it’s the position of the mean that determines skew.

6 Skew based on quartiles
(The data is spread out more in the positive direction, so we have positive skew) Positive skew ? Negative skew ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 𝑄 2 − 𝑄 1 > 𝑄 3 − 𝑄 2 No skew ? 𝑄 2 − 𝑄 1 = 𝑄 3 − 𝑄 2

7 Example Exam Question ? ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1
1st mark ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 Therefore positive skew. 2nd mark ?

8 Test Your Understanding
Available Data Comment on skew (2 marks) 𝑀𝑒𝑑𝑖𝑎𝑛=4, 𝑀𝑒𝑎𝑛=5 Positive skew as 𝑚𝑒𝑎𝑛>𝑚𝑒𝑑𝑖𝑎𝑛 𝑄 1 =3, 𝑄 2 =5, 𝑄 3 =6 Negative skew as 𝑄 2 − 𝑄 1 > 𝑄 3 − 𝑄 2 𝑀𝑒𝑑𝑖𝑎𝑛=5.71, 𝑀𝑒𝑎𝑛=5.72 Little/no skew as median and mean are roughly equal. ? ? ?

9 Calculating Skew 3(mean – median) standard deviation
One measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required) 3(mean – median) standard deviation When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected. Find the skew of the following teachers’ annual salaries: £3 £ £4 £7 £100 Mean = £23.50 ? Median = £4 ? Standard Deviation = £38.28 ? Skew = 1.53 ?

10 Exercise 1 Using the available data in each case, state the skew (1 mark) and give a justification (1 mark). 𝑄 1 =3, 𝑄 2 =5, 𝑄 3 =8 Positive skew as 𝑸 𝟑 − 𝑸 𝟐 > 𝑸 𝟐 − 𝑸 𝟏 Mean =3.2, Median =3.5 Negative skew as 𝒎𝒆𝒂𝒏<𝒎𝒆𝒅𝒊𝒂𝒏 𝑄 1 =6.6, 𝑄 2 =7.7, 𝑄 3 =8.8 No skew as 𝑸 𝟐 − 𝑸 𝟏 = 𝑸 𝟑 − 𝑸 𝟐 Mean =8.91, Median =8.78 Positive skew as 𝒎𝒆𝒂𝒏>𝒎𝒆𝒅𝒊𝒂𝒏 𝑄 1 =4.7, 𝑄 2 =7.1, 𝑄 3 =7.3 Negative skew as 𝑸 𝟐 − 𝑸 𝟏 > 𝑸 𝟑 − 𝑸 𝟐 In each case state whether the mean or median would be a more appropriate average (1 mark), and give a reason (1 mark). 𝑄 1 =3, 𝑄 2 =4, 𝑄 3 = Median as the data is (positively) skewed. Median =5.61, Mean = Median as the data is (negatively) skewed. 1 ? ? ? ? ? 2 ? ?

11 Exercise 1 3 ? ? ? ?

12 Exercise 1 4 ? ? ?

13 Stem and Leaf recap Put the following measurements into a stem and leaf diagram: ? 1 2 3 4 5 4 (1) (4) (9) (12) Key: 2 | 1 means 2.1 It may be quicker to first draw the stem and leaf without the ordering, before then ordering each row. Now find: ? 𝑀𝑜𝑑𝑒=4.7 ? 𝐿𝑜𝑤𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6 𝑈𝑝𝑝𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 ? 𝑀𝑒𝑑𝑖𝑎𝑛=4.05 ?

14 Back-to-Back Stem and Leaf recap
Girls Boys 68 ? Girls Boys 4 5 6 7 8 9 6 1 The data above shows the pulse rate of boys and girls in a school. Comment on the results. The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’. Key: 0|4|6 Means 40 for girls and 46 for boys. ?

15 Box Plot recap Box Plots allow us to visually represent the distribution of the data. Minimum Lower Quartile Median Upper Quartile Maximum 3 15 17 22 27 Sketch Sketch Sketch Sketch Sketch range IQR How is the IQR represented in this diagram? How is the range represented in this diagram? Sketch Sketch

16 Outliers An outlier is: an extreme value. ? Outliers beyond this point More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles. (But you will be told in the exam if the rule differs from this)

17 Examples ? ? Smallest values Largest values Lower Quartile Median
Upper Quartile 0, 3 21, 27 8 10 14 Draw a box plot to represent the above data. ? 𝐼𝑄𝑅=14−8=6 Outlier boundaries: 𝟏𝟒+ 𝟏.𝟓×𝟔 =𝟐𝟑 𝟖− 𝟏.𝟓×𝟔 =−𝟏 Bro Exam Tip: You MUST show your outlier boundary calculations. When there’s an outlier at one end, there’s two allowable places to put the end of the whisker: ? The maximum value not an outlier, 21 (I think this one makes most sense). OR the outlier boundary, 23. Use one or the other (not both).

18 Test Your Understanding
(on your printed sheet) a ? b ? c ?

19 Comparing Box Plots Box Plot comparing house prices of Croydon and Kingston-upon-Thames. Croydon Kingston £100k £150k £200k £250k £300k £350k £400k £450k “Compare the prices of houses in Croydon with those in Kingston”. (2 marks) For 1 mark, one of: In interquartile range of house prices in Kingston is greater than Croydon. The range of house prices in Kingston is greater than Croydon. i.e. Something spread related. For 1 mark: The median house price in Kingston was greater than that in Croydon. i.e. Compare some measure of location (could be minimum, lower quartile, etc.) ? ?

20 Test Your Understanding
(on your printed sheet) Jan 2005 Q2 ? ? ?

21 Exercise 2 (on your printed sheet) a ? b ? c ? d ?

22 Exercise 2 (on your printed sheet) ? ? ? ? ?

23 Exercise 2 (on your printed sheet) ? ? ? ?

24 Exercise 2 (on your printed sheet) ? ? ? ?

25 Exercise 2 ? ? ? (on your printed sheet)
(Solutions to (d) and (e) on next slide) ? ? ?

26 Exercise 2 (on your printed sheet) ? ?

27 Exercise 2 (on your printed sheet) ? 45 5 52 63 12 17 28 ? ?

28 Bar Charts vs Histograms
For continuous data. Data divided into (potentially uneven) intervals. [GCSE definition] Frequency given by area of bars.* No gaps between bars. Bar Charts For discrete data. Frequency given by height of bars. ? Use this as a reason whenever you’re asked to justify use of a histogram. ? ? ? Frequency Density Frequency 1.0m m m m m Height Shoe Size * Not necessarily true. We’ll correct this in a sec.

29 Bar Charts vs Histograms
Q1 Still using the ‘incorrect’ GCSE formula: Weight (w kg) Frequency Frequency Density 0 < w ≤ 10 40 4 10 < w ≤ 15 6 1.2 15 < w ≤ 35 52 2.6 35 < w ≤ 45 10 1 ? ? Freq ? F.D. Width ? Q2 Frequency = 40 ? 5 4 3 2 1 Frequency = 15 ? Frequency = 25 ? Frequency Density Frequency = 30 ? Height (m)

30 SKILL #1 :: Area = frequency?
Unlike at GCSE, the area of a bar is not necessarily equal to the frequency; there are just proportional. ! Identify the scaling 𝑎𝑟𝑒𝑎 ×𝑘 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 using a known area with known frequency (which may be total area/frequency or just one bar) There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s. 5 4 3 2 1 Total frequency is known; therefore find total area and hence the ‘scaling’. Total area = = 24 ? Area Freq × Frequency Density Then use this scaling along with the desired area. ? Area=4×1.5 9 12 18 Area Freq × Time (s)

31 Test Your Understanding
(on your printed sheet) May 2012 Q5 A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results. (a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks) Bro Tip: We can make the frequency density scale what we like. M1 A1: Determine what one small square or one large square is worth. 7 6 5 4 3 2 1 (i.e. work out 𝑎𝑟𝑒𝑎→𝑓𝑟𝑒𝑞 scaling) ? Area Freq × M1 A1: Use this to find number of cars travelling >35mph. ? Area Freq × Write: 22.5×4=90

32 Test Your Understanding
(on your printed sheet) (b) Estimate the value of the mean speed of the cars in the sample. (3 marks) M1 M1: Use histogram to construct sum of speeds. ? 30× ×25+… 450 A1 Correct value ? =28.8 Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.

33 Test Your Understanding
(on your printed sheet) (c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.(2) (d) Comment on the shape of the distribution. Give a reason for your answer. (2) (e) State, with a reason, whether the estimate of the mean or the median is a better representation of the average speed of the traffic on the road. (2) ? ? ?

34 SKILL #2 :: Gaps! ? ? ? ? ? Weight (to nearest kg) Frequency F.D. 1-2
4 𝟒÷𝟐=𝟐 3-6 3 𝟑÷𝟒=𝟎.𝟕𝟓 7-9 3×1=3 𝟏 Note the gaps affects class width! Remember the frequency density axis is only correct to scale, so there may be some scaling. However in an exam scaling is unlikely to be required for F.D. if the F.D. scale is already given. ? ? ? ? We set the scaling between area and frequency to be 1. 2 1 Frequency Density ? Time (s)

35 Test Your Understanding
(on your printed sheet) Jan 2012 Q1 Bro Tip: Be careful that you use the correct class widths! Be sure to ask first what they notice about the ranges in the table (i.e. gaps!) 14 ? 5 ? ? = 69

36 SKILL #3 :: Width and height on diagram
An exam favourite is to ask what width and height we’d draw a bar in a drawn histogram. Q: The frequency table shows some running times. On a histogram the bar for 0-4 seconds is drawn with width 6cm and height 8cm. Find the width and height of the bar for 4-6 seconds. Time (seconds) Frequency 0≤𝑡<4 8 4≤𝑡<6 9 ! Bro Tip: Find the scaling for class width to drawn width and frequency density to drawn height. Strategy ? Solution ? For 0-4 bar: Class width =4 Frequency density =8÷4=2 ∴ Scaling for width: 1.5 Scaling for height: 4 4-6 bar: class width 2, frequency density 4.5 𝑊𝑖𝑑𝑡ℎ=2×1.5=3𝑐𝑚 𝐻𝑒𝑖𝑔ℎ𝑡=4.5×4=18𝑐𝑚

37 Test Your Understanding
(on your printed sheet) ? ?

38 Exercise 3 (on your printed sheet) Q1 ?

39 Exercise 3 ? ? (on your printed sheet) Q2
Answer: Distance is continuous ? Note that gaps in the class intervals! 4 / 5 = 0.8 19 / 5 = 3.8 53 / 10 = 5.3 ...

40 Exercise 3 (on your printed sheet) Q3 ? ? ? ?

41 Exercise 3 (on your printed sheet) Q4 [June 2007 Q5] ? ? ? ? ? ?

42 Exercise 3 (on your printed sheet) Q5 ? ? ? ?

43 Exercise 3 (on your printed sheet) Q6 ? ? ? ? ?

44 Exercise 3 (on your printed sheet) Q7 c ? a ? b ? d ? e ?


Download ppt "S1: Chapter 4 Representation of Data"

Similar presentations


Ads by Google