S1: Chapter 4 Representation of Data

Slides:



Advertisements
Similar presentations
AP Stat Day Days until AP Exam
Advertisements

Describing Quantitative Variables
Chapter 2 Exploring Data with Graphs and Numerical Summaries
S1: Chapter 4 Representation of Data
“Teach A Level Maths” Statistics 1
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
QBM117 Business Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Ζ Dr Frost Cumulative Frequency Graphs.
STAT 13 -Lecture 2 Lecture 2 Standardization, Normal distribution, Stem-leaf, histogram Standardization is a re-scaling technique, useful for conveying.
Chapter 2 Describing Data with Numerical Measurements
Programming in R Describing Univariate and Multivariate data.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
REPRESENTATION OF DATA.
Descriptive Statistics
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
Kinds of data 10 red 15 blue 5 green 160cm 172cm 181cm 4 bedroomed 3 bedroomed 2 bedroomed size 12 size 14 size 16 size 18 fred lissy max jack callum zoe.
S1: Chapters 2-3 Data: Location and Spread Dr J Frost Last modified: 5 th September 2014.
Table of Contents 1. Standard Deviation
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Warm Up Find the mean, median, mode, range, and outliers of the following data. 11, 7, 2, 7, 6, 12, 9, 10, 8, 6, 4, 8, 8, 7, 4, 7, 8, 8, 6, 5, 9 How does.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Measures of Center vs Measures of Spread
Year 8: Data Handling 2 Dr J Frost Last modified: 11 th December 2014 Learning Outcomes: To understand stem and leaf diagrams,
1 Chapter 4 Numerical Methods for Describing Data.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
GCSE: Histograms Dr J Frost
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
Exploratory Data Analysis
GCSE: Histograms Dr J Frost
Statistics 1: Statistical Measures
Mathematics GCSE Revision Key points to remember
Statistics Unit Test Review
Warm Up.
Descriptive Statistics SL
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
Ch. 18- Descriptive Statistics.
3.4 Histograms.
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Chapter 5: Describing Distributions Numerically
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
CHAPTER 1 Exploring Data
Lesson 1: Summarizing and Interpreting Data
Measure of Center And Boxplot’s.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Representation of Data
Displaying and Summarizing Quantitative Data
pencil, red pen, highlighter, GP notebook, graphing calculator
“Teach A Level Maths” Statistics 1
Measures of Position Section 3.3.
Honors Statistics Review Chapters 4 - 5
“Teach A Level Maths” Statistics 1
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
pencil, red pen, highlighter, GP notebook, graphing calculator
Presentation transcript:

S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com Last modified: 20th September 2015

Overview We’ll look at 3 different ways of presenting data, as well as ways of analysing them (including ‘skew’). BOX PLOTS STEM AND LEAF *NEW since GCSE!* Outliers. *NEW since GCSE!* Back to back stem and leaf diagrams. HISTOGRAMS *NEW since GCSE!* Area is not necessarily equal to frequency.

Skew Skew gives a measure of whether the values are more spread out above the median or below the median. mode mode median median mean mean Frequency Frequency Height Weight Sketch Mode Sketch Median Sketch Mean Sketch Mode Sketch Median Sketch Mean We say this distribution has positive skew. ? We say this distribution has negative skew. ? (To remember, think that the ‘tail’ points in the positive direction)

Skew Remember, think what direction the ‘tail’ is likely to point. Distribution Skew ? Salaries on the UK. High salaries drag mean up. So positive skew. Mean > Median ? ? IQ A symmetrical distribution, i.e. no skew. Mean = Median ? ? Heights of people in the UK Will probably be a nice ‘bell curve’. i.e. No skew. Mean = Median The way to remember which way the mean and median go around: the mean is ‘dragged up’ by a tail in the positive direction, so mean > median for positive skew. Picture salaries and you’ll never forget! ? ? Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew. Mean < Median Age of retirement ?

Skew based on mean/median Suppose for some data we had calculated that 𝑚𝑒𝑎𝑛=55.48 and 𝑚𝑒𝑑𝑖𝑎𝑛=56. Describe the skewness of the marks of the students, giving a reason for your answer. (2) Negative skew 1st mark ? because mean < median 2nd mark ? Bro Tip: If you ever forget which way the two go, just think of salaries! High values (i.e. a positive tail) drag up the mean but not the median. So it’s the position of the mean that determines skew.

Skew based on quartiles (The data is spread out more in the positive direction, so we have positive skew) Positive skew ? Negative skew ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 𝑄 2 − 𝑄 1 > 𝑄 3 − 𝑄 2 No skew ? 𝑄 2 − 𝑄 1 = 𝑄 3 − 𝑄 2

Example Exam Question ? ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 1st mark ? 𝑄 3 − 𝑄 2 > 𝑄 2 − 𝑄 1 Therefore positive skew. 2nd mark ?

Test Your Understanding Available Data Comment on skew (2 marks) 𝑀𝑒𝑑𝑖𝑎𝑛=4, 𝑀𝑒𝑎𝑛=5 Positive skew as 𝑚𝑒𝑎𝑛>𝑚𝑒𝑑𝑖𝑎𝑛 𝑄 1 =3, 𝑄 2 =5, 𝑄 3 =6 Negative skew as 𝑄 2 − 𝑄 1 > 𝑄 3 − 𝑄 2 𝑀𝑒𝑑𝑖𝑎𝑛=5.71, 𝑀𝑒𝑎𝑛=5.72 Little/no skew as median and mean are roughly equal. ? ? ?

Calculating Skew 3(mean – median) standard deviation One measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required) 3(mean – median) standard deviation When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected. Find the skew of the following teachers’ annual salaries: £3 £3.50 £4 £7 £100 Mean = £23.50 ? Median = £4 ? Standard Deviation = £38.28 ? Skew = 1.53 ?

Exercise 1 Using the available data in each case, state the skew (1 mark) and give a justification (1 mark). 𝑄 1 =3, 𝑄 2 =5, 𝑄 3 =8 Positive skew as 𝑸 𝟑 − 𝑸 𝟐 > 𝑸 𝟐 − 𝑸 𝟏 Mean =3.2, Median =3.5 Negative skew as 𝒎𝒆𝒂𝒏<𝒎𝒆𝒅𝒊𝒂𝒏 𝑄 1 =6.6, 𝑄 2 =7.7, 𝑄 3 =8.8 No skew as 𝑸 𝟐 − 𝑸 𝟏 = 𝑸 𝟑 − 𝑸 𝟐 Mean =8.91, Median =8.78 Positive skew as 𝒎𝒆𝒂𝒏>𝒎𝒆𝒅𝒊𝒂𝒏 𝑄 1 =4.7, 𝑄 2 =7.1, 𝑄 3 =7.3 Negative skew as 𝑸 𝟐 − 𝑸 𝟏 > 𝑸 𝟑 − 𝑸 𝟐 In each case state whether the mean or median would be a more appropriate average (1 mark), and give a reason (1 mark). 𝑄 1 =3, 𝑄 2 =4, 𝑄 3 =10 Median as the data is (positively) skewed. Median =5.61, Mean =4.3 Median as the data is (negatively) skewed. 1 ? ? ? ? ? 2 ? ?

Exercise 1 3 ? ? ? ?

Exercise 1 4 ? ? ?

Stem and Leaf recap Put the following measurements into a stem and leaf diagram: 4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.2 2.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1 ? 1 2 3 4 5 4 1 2 4 5 2 5 6 6 6 7 7 8 8 0 1 2 2 4 5 6 7 7 7 7 8 0 1 1 2 (1) (4) (9) (12) Key: 2 | 1 means 2.1 It may be quicker to first draw the stem and leaf without the ordering, before then ordering each row. Now find: ? 𝑀𝑜𝑑𝑒=4.7 ? 𝐿𝑜𝑤𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6 𝑈𝑝𝑝𝑒𝑟 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 ? 𝑀𝑒𝑑𝑖𝑎𝑛=4.05 ?

Back-to-Back Stem and Leaf recap Girls 80 84 91 80 98 40 60 64 72 96 85 88 76 54 58 92 80 79 Boys 60 91 65 67 75 46 72 71 57 64 60 50 68 ? Girls Boys 8 5 4 6 4 0 9 8 6 2 8 5 4 0 0 0 8 6 2 2 1 0 4 5 6 7 8 9 6 0 7 9 0 0 4 5 7 8 1 2 4 5 1 The data above shows the pulse rate of boys and girls in a school. Comment on the results. The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’. Key: 0|4|6 Means 40 for girls and 46 for boys. ?

Box Plot recap Box Plots allow us to visually represent the distribution of the data. Minimum Lower Quartile Median Upper Quartile Maximum 3 15 17 22 27 Sketch Sketch Sketch Sketch Sketch range IQR 0 5 10 15 20 25 30 How is the IQR represented in this diagram? How is the range represented in this diagram? Sketch Sketch

Outliers An outlier is: an extreme value. ? Outliers beyond this point 0 5 10 15 20 25 30 More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles. (But you will be told in the exam if the rule differs from this)

Examples ? ? Smallest values Largest values Lower Quartile Median Upper Quartile 0, 3 21, 27 8 10 14 Draw a box plot to represent the above data. ? 𝐼𝑄𝑅=14−8=6 Outlier boundaries: 𝟏𝟒+ 𝟏.𝟓×𝟔 =𝟐𝟑 𝟖− 𝟏.𝟓×𝟔 =−𝟏 Bro Exam Tip: You MUST show your outlier boundary calculations. When there’s an outlier at one end, there’s two allowable places to put the end of the whisker: ? The maximum value not an outlier, 21 (I think this one makes most sense). OR the outlier boundary, 23. Use one or the other (not both). 0 5 10 15 20 25 30

Test Your Understanding (on your printed sheet) a ? b ? c ?

Comparing Box Plots Box Plot comparing house prices of Croydon and Kingston-upon-Thames. Croydon Kingston £100k £150k £200k £250k £300k £350k £400k £450k “Compare the prices of houses in Croydon with those in Kingston”. (2 marks) For 1 mark, one of: In interquartile range of house prices in Kingston is greater than Croydon. The range of house prices in Kingston is greater than Croydon. i.e. Something spread related. For 1 mark: The median house price in Kingston was greater than that in Croydon. i.e. Compare some measure of location (could be minimum, lower quartile, etc.) ? ?

Test Your Understanding (on your printed sheet) Jan 2005 Q2 ? ? ?

Exercise 2 (on your printed sheet) a ? b ? c ? d ?

Exercise 2 (on your printed sheet) ? ? ? ? ?

Exercise 2 (on your printed sheet) ? ? ? ?

Exercise 2 (on your printed sheet) ? ? ? ?

Exercise 2 ? ? ? (on your printed sheet) (Solutions to (d) and (e) on next slide) ? ? ?

Exercise 2 (on your printed sheet) ? ?

Exercise 2 (on your printed sheet) ? 45 5 52 63 12 17 28 ? ?

Bar Charts vs Histograms For continuous data. Data divided into (potentially uneven) intervals. [GCSE definition] Frequency given by area of bars.* No gaps between bars. Bar Charts For discrete data. Frequency given by height of bars. ? Use this as a reason whenever you’re asked to justify use of a histogram. ? ? ? Frequency Density Frequency 1.0m 1.2m 1.4m 1.6m 1.8m 6 7 8 9 Height Shoe Size * Not necessarily true. We’ll correct this in a sec.

Bar Charts vs Histograms Q1 Still using the ‘incorrect’ GCSE formula: Weight (w kg) Frequency Frequency Density 0 < w ≤ 10 40 4 10 < w ≤ 15 6 1.2 15 < w ≤ 35 52 2.6 35 < w ≤ 45 10 1 ? ? Freq ? F.D. Width ? Q2 Frequency = 40 ? 5 4 3 2 1 Frequency = 15 ? Frequency = 25 ? Frequency Density Frequency = 30 ? 10 20 30 40 50 Height (m)

SKILL #1 :: Area = frequency? Unlike at GCSE, the area of a bar is not necessarily equal to the frequency; there are just proportional. ! Identify the scaling 𝑎𝑟𝑒𝑎 ×𝑘 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 using a known area with known frequency (which may be total area/frequency or just one bar) There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s. 5 4 3 2 1 Total frequency is known; therefore find total area and hence the ‘scaling’. Total area = 15 + 9 = 24 ? Area Freq 24 ×2.5 60 Frequency Density Then use this scaling along with the desired area. ? Area=4×1.5 9 12 18 Area Freq 6 ×2.5 15 Time (s)

Test Your Understanding (on your printed sheet) May 2012 Q5 A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.   (a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks) Bro Tip: We can make the frequency density scale what we like. M1 A1: Determine what one small square or one large square is worth. 7 6 5 4 3 2 1 (i.e. work out 𝑎𝑟𝑒𝑎→𝑓𝑟𝑒𝑞 scaling) ? Area Freq 112.5 ×4 450 M1 A1: Use this to find number of cars travelling >35mph. ? Area Freq 22.5 ×4 90 Write: 22.5×4=90

Test Your Understanding (on your printed sheet) (b) Estimate the value of the mean speed of the cars in the sample. (3 marks) M1 M1: Use histogram to construct sum of speeds. ? 30×12.5+240×25+… 450 A1 Correct value ? =28.8 Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.

Test Your Understanding (on your printed sheet) (c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.(2) (d) Comment on the shape of the distribution. Give a reason for your answer. (2) (e) State, with a reason, whether the estimate of the mean or the median is a better representation of the average speed of the traffic on the road. (2) ? ? ?

SKILL #2 :: Gaps! ? ? ? ? ? Weight (to nearest kg) Frequency F.D. 1-2 4 𝟒÷𝟐=𝟐 3-6 3 𝟑÷𝟒=𝟎.𝟕𝟓 7-9 3×1=3 𝟏 Note the gaps affects class width! Remember the frequency density axis is only correct to scale, so there may be some scaling. However in an exam scaling is unlikely to be required for F.D. if the F.D. scale is already given. ? ? ? ? We set the scaling between area and frequency to be 1. 2 1 Frequency Density ? 1 2 3 4 5 6 7 8 9 10 Time (s)

Test Your Understanding (on your printed sheet) Jan 2012 Q1 Bro Tip: Be careful that you use the correct class widths! Be sure to ask first what they notice about the ranges in the table (i.e. gaps!) 14 ? 5 ? ? 21 + 45 + 3 = 69

SKILL #3 :: Width and height on diagram An exam favourite is to ask what width and height we’d draw a bar in a drawn histogram. Q: The frequency table shows some running times. On a histogram the bar for 0-4 seconds is drawn with width 6cm and height 8cm. Find the width and height of the bar for 4-6 seconds. Time (seconds) Frequency 0≤𝑡<4 8 4≤𝑡<6 9 ! Bro Tip: Find the scaling for class width to drawn width and frequency density to drawn height. Strategy ? Solution ? For 0-4 bar: Class width =4 Frequency density =8÷4=2 ∴ Scaling for width: 1.5 Scaling for height: 4 4-6 bar: class width 2, frequency density 4.5 𝑊𝑖𝑑𝑡ℎ=2×1.5=3𝑐𝑚 𝐻𝑒𝑖𝑔ℎ𝑡=4.5×4=18𝑐𝑚

Test Your Understanding (on your printed sheet) ? ?

Exercise 3 (on your printed sheet) Q1 ?

Exercise 3 ? ? (on your printed sheet) Q2 Answer: Distance is continuous ? Note that gaps in the class intervals! 4 / 5 = 0.8 19 / 5 = 3.8 53 / 10 = 5.3 ...

Exercise 3 (on your printed sheet) Q3 ? ? ? ?

Exercise 3 (on your printed sheet) Q4 [June 2007 Q5] ? ? ? ? ? ?

Exercise 3 (on your printed sheet) Q5 ? ? ? ?

Exercise 3 (on your printed sheet) Q6 ? ? ? ? ?

Exercise 3 (on your printed sheet) Q7 c ? a ? b ? d ? e ?