A Look at Means, Variances, Standard Deviations, and z-Scores

Slides:



Advertisements
Similar presentations
CHAPTER 1 Exploring Data
Advertisements

1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
12.3 – Measures of Dispersion
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Department of Quantitative Methods & Information Systems
Chapter 6.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Descriptive Statistics
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
A Closer Look at the NEW High School Statistics Standards Focus on A.9 and AII.11 K-12 Mathematics Institutes Fall 2010 A Closer Look at the NEW High School.
Warm Up Solve for x 2) 2x + 80 The product of a number
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
 The range of a data set is the difference between the maximum and minimum data entries in the set. The find the range, the data must be quantitative.
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Descriptive Statistics: Presenting and Describing Data.
Measures of Center vs Measures of Spread
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Descriptive Statistics 2.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Measures of Variation 1 Section 2.4. Section 2.4 Objectives 2 Determine the range of a data set Determine the variance and standard deviation of a population.
Section 3-2 Measures of Variation.
Chapter 2 Descriptive Statistics 1 Larson/Farber 4th ed.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Section 2.4 Measures of Variation. Section 2.4 Objectives Determine the range of a data set Determine the variance and standard deviation of a population.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.
Section 2.4 Measures of Variation 1 of 149 © 2012 Pearson Education, Inc. All rights reserved.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics Measures of Variation
Intro to Statistics Part II Descriptive Statistics
Statistics Unit Test Review
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Descriptive Statistics
Please take out Sec HW It is worth 20 points (2 pts
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Section 2.4 Measures of Variation.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Presentation transcript:

A Look at Means, Variances, Standard Deviations, and z-Scores Dr. Margie Mason College of William and Mary mmmaso@wm.edu

Based on the Technical Assistance Document 2009 Algebra I Standard of Learning A.9 http://www.doe.virginia.gov/instruction/high_school/mathematics/technical_assistance_algebra1_a9.pdf

Algebra Standard of Learning A.9 The student, given a set of data, will interpret variation in real-world contexts and calculate and interpret mean absolute deviation, standard deviation, and z-scores.

Mathematics SOL 5.16 5.16 The student will a) describe mean, median, and mode as measures of center; b) describe mean as fair share; c) find the mean, median, mode, and range of a set of data; and d) describe the range of a set of data as a measure of variation.

Algebra SOL A.10 The student will compare and contrast multiple univariate data sets, using box-and-whisker plots.

Let’s use unifix stick heights of 5 6 8 8 10 13 15 17 18 20 Algebra SOL A.10 Let’s use unifix stick heights of 5 6 8 8 10 13 15 17 18 20

Determining a Box-and-Whisker Plot Find: Median – the middle value when arranged from smallest to largest. 11.5 Lower extreme (LE) – the smallest value 5 Upper extreme (UE) – the largest value 20 Lower quartile (LQ) – the value halfway between the lower extreme and the median. 8 Upper quartile (UQ) – the value halfway between the upper extreme and the median. 17 Interquartile range (IQR) – The difference between the upper quartile and the lower quartile. 9

Determining a Box-and-Whisker Plot Determine and draw the scale. Subtract the smallest value from the largest value to determine the range. Choose a reasonable size for the intervals based on the range to be covered, e.g., 1, 2, 5, or 10. Draw the scale much like a number line at the bottom of your plot.

Determining a Box-and-Whisker Plot 3. Draw the box: a. Length of the box extends for LQ to UQ. It is drawn above the scale. b. Mark the median. c. Width of box can be anything.

Determining a Box-and-Whisker Plot 4. Draw the whiskers: a. Draw from the box you just drew to LE and UE.

Determining a Box-and-Whisker Plot Alternate method for drawing the whiskers: 4. Determine the outliers: Multiply the IQR by 1.5 and add this number to the UQ and subtract it from the LQ. Any values outside these limits are outliers.

Determining a Box-and-Whisker Plot Alternate Method: 5. Draw whiskers: a. Draw from box to LE and UE excluding outliers. b. Place asterisks on any outliers.

Determining a Box-and-Whisker Plot How many years of experience as a teacher do you have?

Mathematics SOL 6.15 The student will a) describe mean as balance point; and b) decide which measure of center is appropriate for a given purpose.

Mathematics SOL 6.15 The mean is the numerical average of the data set and is found by adding the numbers in the data set together and dividing the sum by the number of data pieces in the set. In grade 5 mathematics, mean is defined as fair-share.

Mathematics SOL 6.15 Mean can be defined as the point on a number line where the data distribution is balanced. This means that the sum of the distances from the mean of all the points above the mean is equal to the sum of the distances of all the data points below the mean. This is the concept of mean as the balance point. Defining mean as balance point is a prerequisite for understanding standard deviation.

Mathematics SOL 6.15 Balance Point: The sum of the distances on a number line from the mean of all the points above the mean is equal to the sum of the distances of all the data points below the mean. 7 + 6 + 4 + 4 + 2 = 1 + 3 + 5 + 6 + 8

Sample vs. Population Data A statistical population includes all elements in the set. A sample is a subset of the population. A data set, whether a sample or population, is comprised of individual data points referred to as elements of the data set. Start with small defined population data sets of approximately 30 items or less.

*Elements* An element of a data set will be represented as xi. Where i represents the ith term of the data set.

Sample vs. Population Data The arithmetic mean of a population is represented by the Greek letter (mu), while the calculated arithmetic mean of a sample is represented by , read “x bar.”

Mean Absolute Deviation vs. Variance and Standard Deviation Measuring dispersion or spread of a data set about the mean One measure of spread is to find the sum of the deviations between each element and the mean; however, this sum is always zero.

Mean Absolute Deviation vs. Variance and Standard Deviation Two methods: take the absolute value of the deviations before finding the average, (Mean Absolute Deviation) or square the deviations before find the average (Variance and Standard Deviation)

Mean Absolute Deviation vs. Variance and Standard Deviation Summation Notation

Mean Absolute Deviation The arithmetic mean of the absolute values of the deviations of elements from the mean of a data set. 5 6 8 8 10 13 15 17 18 20 |5-12| + |6-12| + |8-12| + |8-12| +|10-12| + |13-12| + |15-12| +|17-12| + |18-12| + |20-12| 10 = 7 + 6 + 4 + 4 + 2 + 1 + 3 + 5 + 6 + 8 10  = 46 = 4.6

*Mean Absolute Deviation* where represents the mean of the data set, n represents the number of elements in the data set, and xi represents the ith element of the data set.

Mean Absolute Deviation Mean absolute deviation is less affected by outlier data than the variance and standard deviation. Outliers are elements that fall at least 1.5 times the interquartile range (IQR) below the first quartile (Q1) or above the third quartile (Q3).

Variance The average of the squared deviations from the mean is known as the variance and is another measure of the spread of the elements in the set. (5-12)2 + (6-12)2 + (8-12)2 + (8-12)2 +(10-12)2 + (13-12)2 + (15-12)2 +(17-12)2 + (18-12)2 + (20-12)2 10 = (-7)2 + (-6) 2 + (-4) 2 + (-4) 2 + (-2) 2 + 12 + 32 + 52 + 62 + 82   = 49 + 36 + 16 + 16 + 4 + 1 + 9 + 25 + 36 + 64 = 256 = 25.6 10 10

*Variance* Variance (s 2)= where represents the mean of the data set, n represents number of elements in the data set, and xi represents the ith element of the data set.

Variance The differences are squares so that they don’t cancel each other out when finding the sum. When squaring the differences, the units of measure are squared and the larger differences are “weighted” more heavily than smaller differences. In order to provide a measure of variation in terms of the original units of the date, the square root of the variance is taken, yielding the standard deviation.

Standard Deviation The positive square root of the variance of the data set. The greater the value, the more spread out the data are about the mean. The lesser (closer to 0) the value, the closer the data are clustered about the mean.

*Standard Deviation* Standard deviation (s)= where represents the mean of the data set, n represents the number of elements in the data set, and xi represents the ith element of the data set.

Standard Deviation “ ”, written and read “sigma”, represents the standard deviation of a population and “s” represents the sample standard deviation. s = the square root of 25.6 = 5.06

Standard Deviation The population standard deviation can be estimated by calculating the sample standard deviation. The formulas for sample and population look similar except that the sample standard deviation formula uses n – 1 instead of n in the denominator. This is to account for the possibility of greater variability of data in the population than what was seen in the sample.

Standard Deviation When n-1 is used in the denominator, the result is a larger number. So the calculated value of the sample standard deviation will be larger than the population standard deviation. As sample sizes get larger, the difference gets smaller. The use of n-1 is known as Bessel’s correction. SOL A.9 used the population standard deviation with n in the denominator.

Interpreting Standard Deviation Standard deviation is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.

Interpreting Standard Deviation Empirical Rule (68 -95-99.7 Rule) For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics: About 68% of the data lie within one standard deviation of the mean. About 95% of the data lie within two standard deviations of the mean. About 99.7% of the data lie within three standard deviations of the mean.

Empirical Rule (68 – 95 – 99.7 Rule) 99.7% within 3 standard deviations 2.35% 95% within 2 standard deviations 13.5% 68% within 1 standard deviation 34%

Example: Using the Empirical Rule In a survey conducted by the National Center for Health Statistics, the sample mean height of women in the United States (ages 20-29) was 64.3 inches, with a sample standard deviation of 2.62 inches. Estimate the percent of the women whose heights are between 59.06 inches and 64.3 inches.

Example: Using the Empirical Rule Because the distribution is bell-shaped, you can use the Empirical Rule. 34% + 13.5% = 47.5% of women are between 59.06 and 64.3 inches tall.

Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: k = 2: In any data set, at least of the data lie within 2 standard deviations of the mean. k = 3: In any data set, at least of the data lie within 3 standard deviations of the

Using Chebychev’s Theorem The age distribution for Florida is shown in the histogram. Apply Chebychev’s Theorem to the data using k = 2. What can you conclude?

Using Chebychev’s Theorem k = 2: μ – 2σ = 39.2 – 2(24.8) = – 10.4 (use 0 since age can’t be negative) μ + 2σ = 39.2 + 2(24.8) = 88.8 At least 75% of the population of Florida is between 0 and 88.8 years old.

z-Scores A z-score, also called a standard score, is a measure of the position derived from the mean and standard deviation of the data set. In Algebra I, the z-score will be used to determine how many standard deviations an element is above of below the mean of the data set. It can also be used to determine the value of the element, given the z-score of an unknown element and the mean and standard deviation of a data set.

The Standard Score (z-Score) Represents the number of standard deviations a given value x falls from the mean μ.

z-Scores The z-score will be positive if the element lies above the mean and negative if it lies below the mean. A z-score is calculated by subtracting the mean of the data set from the element and dividing the result by the standard deviation of the data set.

*z-Scores* z-score (z) = where x represents an element of the data set, m represents the mean of the data set, and s represents the standard deviation of the data set.

z-Scores z-score of 5 = (5 - 12)/5.06 = -1.38

z-Scores Suppose you had the misfortune to have an Algebra test and a history test on the same day. Why you got your tests back, here is the information given to you regarding your performance and the performance of the class on these exams. Which test did you do better on? Algebra History Your score 82 93 Mean 71.06 85.43 Stan. Dev 10.32 18.91

z-Scores Suppose the mean and standard deviation on an algebra test were given as 72 and 12, respectively. Susan’s z-score on the test was 2.34. Was Susan’s test score above or below the mean? How do you know? David’s z-score on the same test was -1.25. Was David’s test score above or below the mean? How do you know?

z-Scores Suppose the mean and standard deviation on an algebra test were given as 72 and 12, respectively. Dakota had a z-score of 0.08 on the test. What does this z-score tell you about Dakota’s test score relative to the mean?

z-Scores Suppose the mean and standard deviation on an algebra test were given as 72 and 12, respectively. If the z-score for Susan was 2.34, for David was -1.25, and for Dakota was 0.08, find their actual test scores.

z-Scores Suppose the mean and standard deviation on an algebra test were given as 72 and 12, respectively. Rebecca made an 80 on the biology test. Find her z-score.

Interpretation of Descriptive Statistics Data set 1 Data set 2 Number of Basketball Players Recorded Once Each Day from April 1-14 Number of Basketball Players Recorded Once Each Day from April 15-28 7 6 5 4 3 2 1 1-10 11-20 21-30 31-40 41-50 51-60 61-70 Number of Players Mean = 45.0 Variance = 106.1 Standard Deviation = 10.3 Mean Absolute Deviation = 9.1 Variance = 420.3 Standard Deviation = 20.5 Mean Absolute Deviation = 16

Interpretation of Descriptive Statistics Maya represented the heights of boys in Mrs. Constantine’s and Mr. Kluge’s classes on a line plot and calculated the mean and standard deviation. Heights of Boys in Mrs. Constantine’s and Mr. Kluge’s Classes (in inches) x x x x x x x x x x x x x x x x x x x x 64 65 66 67 68 69 70 71 72 73 Mean = 68.4 Standard Deviation = 2.3 Note: In this problem, a small, defined population of the boys in Mrs. Constantine’s and Mr. Kluge’s classes is assumed.