Representation and Summary of Data - Location.

Slides:



Advertisements
Similar presentations
3.3 Measures of Position Measures of location in comparison to the mean. - standard scores - percentiles - deciles - quartiles.
Advertisements

STATISTICS.
Math Qualification from Cambridge University
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Central Tendency.
Sexual Activity and the Lifespan of Male Fruitflies
Chapter 5: Averaging Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger.
Measures of Central Tendency U. K. BAJPAI K. V. PITAMPURA.
MEASURES of CENTRAL TENDENCY.
Chapter 13 – Univariate data
Initial Data Analysis Central Tendency. Notation  When we describe a set of data corresponding to the values of some variable, we will refer to that.
Data Handling Collecting Data Learning Outcomes  Understand terms: sample, population, discrete, continuous and variable  Understand the need for different.
Averages….. In table form. We will start with discrete data i.e. the data can only take certain values e.g. shoe size.
Measurements of Central Tendency. Statistics vs Parameters Statistic: A characteristic or measure obtained by using the data values from a sample. Parameter:
Dr. Serhat Eren DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapter 3 Averages and Variations
Kinds of data 10 red 15 blue 5 green 160cm 172cm 181cm 4 bedroomed 3 bedroomed 2 bedroomed size 12 size 14 size 16 size 18 fred lissy max jack callum zoe.
Worked examples and exercises are in the text STROUD (Prog. 28 in 7 th Ed) PROGRAMME 27 STATISTICS.
S1: Chapters 2-3 Data: Location and Spread Dr J Frost Last modified: 5 th September 2014.
CHAPTER 36 Averages and Range. Range and Averages RANGE RANGE = LARGEST VALUE – SMALLEST VALUE TYPES OF AVERAGE 1. The MOST COMMON value is the MODE.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Statistics Measures Chapter 15 Sections
Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape of a histogram MSIP / Home Learning: p. 146 #1, 2, 4,
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Data Quantitative data are numerical observation –age of students in a class. Age is quantitative data because it quantifies the age of a person Qualitative.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
Working with one variable data. Measures of Central Tendency In statistics, the three most commonly used measures of central tendency are: Mean Median.
Chapter 9 Statistics.
STATISTICS. STATISTICS The numerical records of any event or phenomena are referred to as statistics. The data are the details in the numerical records.
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
Central Tendency A statistical measure that serves as a descriptive statistic Determines a single value –summarize or condense a large set of data –accurately.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Finding averages from the frequency table. In this screencast Mean from frequency table Mean from frequency table with intervals Mode from frequency table.
Introduction to statistics I Sophia King Rm. P24 HWB
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
STROUD Worked examples and exercises are in the text Programme 28: Data handling and statistics DATA HANDLING AND STATISTICS PROGRAMME 28.
Measures of Central Tendency: Just an Average Topic in Statistics.
Measure of central tendency In a representative sample, the values of a series of data have a tendency to cluster around a certain point usually at the.
Data Descriptions.
Business Decision Making
Discrete vs continuous
Organizing Quantitative Data: The Popular Displays
Exploratory Data Analysis
Methods of mathematical presentation (Summery Statistics)
Statistics -S1.
Mathematics for GCSE Science
Chapter 2: Methods for Describing Data Sets
Averages Alevel Stats 1 with Liz.
Represent the solutions of the following inequalities graphically.
Chapter 5 STATISTICS (PART 1).
PROGRAMME 27 STATISTICS.
Single Variable Data Analysis
Mean, Median, Mode, and Range
Statistics.
Measures of Central Tendency
6.5 Measures of central tendency
An Introduction to Statistics
Mathematics for GCSE Science
MEASURES OF CENTRAL TENDENCY
Sexual Activity and the Lifespan of Male Fruitflies
Representation of Data
Lecture 3: Organization and Summarization of Data
Mean, Median, Mode & Range
14.2 Measures of Central Tendency
Chapter 3: Data Description
Measures of location: Mean
Welcome GCSE Maths.
Chapter 5 – 9 Measure of Central Tendency
Presentation transcript:

Representation and Summary of Data - Location

Representation and Summary of Data - Location This chapter is generally about calculating averages, also known as ‘measures of location’ Much of the topics you will have seen at GCSE, but we will begin to use proper ‘mathematical notation’ when solving problems By the end of the chapter you will have seen:  Key words to remember  Mean, Median and Mode (including from a table)  The difference between a discrete frequency table and a continuous frequency table and its effect on calculations  How to use coding

Teachings for Exercise 2A

Representation and Summary of Data - Location Key Terms Quantitative Variable Data which is numerical eg) Height, profits, number of beads in a bag Qualitative Variable Data which is not numerical eg) Car colour, brand name of clothes Discrete Data Numerical data that only takes certain values eg) Shoe size, goals scored Continuous Data Numerical Data that takes any value eg) Height, Weight, Time taken Quantitative Qualitative Continuous Discrete 2A

Representation and Summary of Data - Location Data in a table Rebecca records the shoe size, x, of the female students in her year. The table shows her results. Find: a) The number of students who take size 37  29 b) The shoe size taken by the smallest number of female students  35 c) The shoe size taken by the largest number of female students  38 d) The total number of students in the year  Add them up..  95 x Number of students, f 35 3 36 17 37 29 38 34 39 12 x  The data you are looking at f  frequency 2A

Representation and Summary of Data - Location Data in a table Add a Cumulative Frequency column to the table  Add up the totals after each additional group x Number of students, f Cumulative Frequency 35 3 3 36 17 20 37 29 49 38 34 83 39 12 95 x  The data you are looking at f  frequency 2A

Representation and Summary of Data - Location Data in a Grouped Frequency Table You need to know the following when working with grouped data: Groups are known as classes You need to know how to find class boundaries You need to be able to work out the mid-point of a class You need to be able to find the class width Length of Wing (mm) Number of Butterflies, f 30-31 2 32-33 25 34-36 30 37-39 13 2A

Representation and Summary of Data - Location Data in a Grouped Frequency Table Write down the class boundaries, mid-point and class width for the group 34-36 a) Class boundaries  As there are gaps between the groups, the groups are said to begin and end halfway between each other  33.5 – 36.5 b) Midpoint  Add up the boundaries and divide by 2  (33.5 + 36.5) ÷ 2 = 35mm c) Class width  The upper boundary minus the lower boundary  36.5 – 33.5 = 3mm Length of Wing (mm) Number of Butterflies, f 30-31 2 32-33 25 34-36 30 37-39 13 33.5 36.5 2A

Representation and Summary of Data - Location Data in a Grouped Frequency Table Write down the class boundaries, mid-point and class width for the group 70 < s ≤ 75 a) Class boundaries  No gaps so the same as in the table  70 - 75 b) Midpoint  Add up the boundaries and divide by 2  (70 + 75) ÷ 2 = 72.5s c) Class width  The upper boundary minus the lower boundary  75 – 70 = 5s Time taken (s) Number of females, f 55 < s ≤ 65 2 65 < s ≤ 70 25 70 < s ≤ 75 30 75 < s ≤ 90 13 70 75 2A

Teachings for Exercise 2B

Representation and Summary of Data - Location Measures of Location (averages) Mode The most common value in a set of data. Median The middle value when the data is put into ascending order For n observations, divide n by 2. If whole, find the midpoint of the corresponding term and the term above. If not whole, round up and find the corresponding term Eg) For the median of 14 values 14 ÷ 2 = 7 7 is whole so the median will be the midpoint of the 7th and 8th terms. For the median of 29 values 29 ÷ 2 = 14.5 Round up to 15 and find the 15th value 2B

Representation and Summary of Data - Location Measures of Location (averages) Mean The sum of the observations divided by the total number of observations This is written as: The symbol means ‘the sum of’ The x represents the observations The n stands for the number of observations Often, the mean is denoted by . (x-bar) 2B

Representation and Summary of Data - Location Measures of Location (averages) Calculate the mean, median and mode of the set of data below… 2, 6, 18, 21, 16, 17, 6, 5, 5, 1, 5, 3 a) Mode  5 b) Median 1, 2, 3, 5, 5, 5, 6, 6, 16, 17, 18, 21 12 ÷ 2 = 6 So find the mid-point of the 6th and 7th terms  5.5 c) Mean You must get into the habit of showing workings like this! 2B

Representation and Summary of Data - Location Measures of Location (averages) Ben collects 8 pieces of data and calculates that is 13.5 Calculate the mean: (2dp) 2B

Representation and Summary of Data - Location Measures of Location (averages) You need to be able to calculate a combined mean. 1) If the mean pay of 20 workers is £5 per hour, and the mean of a different 20 workers is £6 per hour, what is the overall mean? The midpoint of £5 and £6 = £5.50 If the mean pay of 5 workers is £8 per hour and the mean pay of a different 12 workers is £6 per hour, what is the overall mean? This is not as simple. You need to work out the total pay, and the total number of people. Total Pay  (5 x £8) + (12 x £6) = £112 Total People  (5 + 12) = 17 (2dp) 2B

Representation and Summary of Data - Location Measures of Location (averages) You need to be able to calculate a combined mean. In general, you can use a formula… If data set 1 has observations given by , and mean , and set 2 has observations , and mean then: Mean of set 1 multiplied by observations in set 1 Mean of set 2 multiplied by observations in set 2 Overall mean Total number of observations 2B

Representation and Summary of Data - Location Measures of Location (averages) Using the formula A sample of 25 observations has a mean of 6.4. The mean of a second sample is 7.2, with 30 observations. Calculate the overall mean. (2dp) You must get into the habit of showing workings like this! 2B

Representation and Summary of Data - Location Measures of location (averages) You should realise that the 3 measures of location have different advantages and disadvantages. Mode Can be used with any data, qualitative or quantitative. No use when there isn’t a common value. Median Used with quantitative data and is unaffected by extreme values. Only uses the middle value(s) though. Mean Uses all the data but can be affected by extreme values. 2B

Teachings for Exercise 2C

Representation and Summary of Data - Location Measures of location (averages) from tables Rebecca records the shirt collar size, x, of male students in her year group. Her results are in the table. Find the modal collar size.  16.5 as this is the collar size which occurred most often (34 times) x Number of students, f 15 3 15.5 17 16 29 16.5 34 12 2C

Representation and Summary of Data - Location Measures of location (averages) from tables Find the median collar size.  Fill in the Cumulative Frequency column  Total ÷ 2  95 ÷ 2 = 47.5  The median will be the 48th value  Find which group the 48th value will be in, using the Cumulative Frequency column  The median is 16 x Number of students, f Cumulative Frequency 15 3 3 15.5 17 20 16 29 49 16.5 34 83 17 12 95 2C

Representation and Summary of Data - Location Measures of location (averages) from tables Find the mean collar size. Sum of collar sizes ÷ Total students  1537.5 ÷ 95  16.18 (2dp) x Number of students, f fx 15 3 45 15.5 17 263.5 16 29 464 16.5 34 561 17 12 204 Total 95 1537.5 2C

Representation and Summary of Data - Location Measures of location (averages) from tables Find the mean collar size. This is the formula you are actually using: x Number of students, f fx 15 3 45 15.5 17 263.5 Sum of ‘f times x’ 16 29 464 16.5 34 561 Sum of ‘f’ 17 12 204 Total 95 1537.5 2C

Teachings for Exercise 2D

Representation and Summary of Data - Location Measures of location (averages) from grouped tables All grouped data is treated as continuous data, and you need to be able to calculate all 3 averages from this kind of table. The mode is essentially the same, the group with the highest frequency We will be focusing on the median and mean, and is important to know that when data is grouped, you do not know the actual values. Therefore, the median and mean from a grouped table are only estimates and not necessarily accurate. 2D

Representation and Summary of Data - Location Mean from a grouped table To calculate the mean from a grouped table, we use the same formula as for an ungrouped table. The difference is that x is now the midpoint of each class, rather than actual values Length of Pine Cone (mm) Number of Cones, f 30-31 2 32-33 25 34-36 30 37-39 13 2D

Representation and Summary of Data - Location Mean from a grouped table Fill in 2 columns on the table (sometimes you will have to remember which columns you need) Length of Pine Cone (mm) Number of Cones, f Midpoint (x) fx 30-31 2 30.5 61 32-33 25 32.5 812.5 34-36 30 35 1050 37-39 13 38 494 Total 70 2417.5 2D

Representation and Summary of Data - Location Median from a grouped table We will be using a formula to estimate the median, but first we will try to understand the process. First, find which group it is in…  Complete the Cumulative Frequency column  70 ÷ 2 = 35 (for continuous data you just divide by 2)  It will be in the 34-36 group Our next step is to consider ‘how far’ the median will be into the group Length of Pine Cone (mm) Number of Cones, f Cumulative Frequency 30-31 2 2 32-33 25 27 34-36 30 57 37-39 13 70 2D

Representation and Summary of Data - Location Median from a grouped table We have had 27 observations so far… Length of Pine Cone (mm) Number of Cones, f Cumulative Frequency 33.5 8 values to go 30-31 2 2 32-33 25 27 30 values in group 34-36 30 57 37-39 13 70 35th value, in the 34-36 group 36.5 The Median will be 8/30ths into a group with a class width of 3 8/30 of 3 = 0.8, so the median is 0.8 into the group

Representation and Summary of Data - Location Median from a grouped table The median is 0.8 into the group The lower boundary of the group is 33.5 33.5 + 0.8 = 34.3 So our estimate of the median is 34.3, this process is known as interpolation. Length of Pine Cone (mm) Number of Cones, f Cumulative Frequency 30-31 2 2 32-33 25 27 34-36 30 57 37-39 13 70 35th value, in the 34-36 group

Representation and Summary of Data - Location Median from a grouped table The formula (most important bit!) Length of Pine Cone (mm) Number of Cones, f Cumulative Frequency ( ) Lower Boundary Places into Group Group Frequency + x Classwidth 30-31 2 2 ( ) 32-33 25 27 8 30 33.5 + x 3 34-36 30 57 = 34.3 37-39 13 70 You must get into the habit of showing workings like this! 35th value, in the 34-36 group 2D

Teachings for Exercise 2E

Representation and Summary of Data - Location Coding You need to understand why data is coded, how to code it and how to un-code it. Coding is done before any average is calculated, and is usually used with large values of data in order to simplify calculations Once data has been coded, averages are calculated Then after the average is worked out, the code is reversed in order to give the actual average

Representation and Summary of Data - Location Coding Use the following coding to calculate the mean of the data below 110, 120, 130, 140, 150 Coding  So this code is telling us to subtract 100 from all the numbers before calculating the mean 10, 20, 30, 40, 50 The mean of these numbers is 30 However as 100 was subtracted, you must now undo this to get the correct mean  So the mean of the original set of data is 130 x represents the original value y is the coded value

Representation and Summary of Data - Location Coding Use the following coding to calculate the mean of the data below 110, 120, 130, 140, 150 Coding  So this code is telling us to subtract 100 from all the numbers, and then divide by 10, before calculating the mean 1, 2, 3, 4, 5 The mean of these numbers is 3 We subtracted 100 then divided by 10.. So to undo this we must multiply by 10 then add 100…  So the mean of the original set of data is 130 x represents the original value y is the coded value

Representation and Summary of Data - Location Coding Use the following code to estimate the mean of this set of grouped data on the lengths of phonecalls. First the midpoints (x) must be turned into new values (y) using the code. We are now working out the mean, so use the formula for this. Time (mins) Calls Midpoint, x y fy 0-5 4 2.5 -1 -4 5-10 15 7.5 10-15 5 12.5 1 5 15-20 2 17.5 2 4 20-60 40 6.5 60-70 1 65 11.5 11.5 Total 27 16.5

Representation and Summary of Data - Location Coding We calculated a mean of 0.61111 using the code  So we subtracted 7.5 and then divided by 5  We therefore need to multiply by 5 and then add 7.5  (0.61111 x 5) + 7.5  The mean for the original data ( ) is 10.5555 (10.56 to 2dp)

Summary We have now covered all of chapter 2 We have seen the 3 measures of location (averages) We have seen how to calculate them in tables, using midpoints and interpolation where appropriate We have looked at combination means We have also used coding in answering questions