Measures of Center.

Slides:



Advertisements
Similar presentations
CHAPTER 1 Exploring Data
Advertisements

CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
1.3: Describing Quantitative Data with Numbers
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Chapter 3 Looking at Data: Distributions Chapter Three
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
MAT 446 Supplementary Note for Ch 1
Describing Distributions Numerically
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
Warmup Draw a stemplot Describe the distribution (SOCS)
CHAPTER 1 Exploring Data
1.3 Describing Quantitative Data with Numbers
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Practice of Statistics, 4th edition STARNES, YATES, MOORE
Chapter 1: Exploring Data
Presentation transcript:

Measures of Center

Measuring Center: The Mean The most common measure of center is the ordinary arithmetic average, or mean. To find the mean (pronounced “x-bar”) of a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, x3, …, xn, their mean is: In mathematics, the capital Greek letter Σ is short for “add them all up.” Therefore, the formula for the mean can be written in more compact notation:

SUPPOSE THAT AN INSTRUCTOR IS TEACHING TWO SECTIONS OF A COURSE AND THAT SHE CALCULATES THE MEAN EXAM SCORE TO BE 60 FOR SECTION 1 AND 90 FOR SECTION 2 Do you have enough information to determine the mean exam score for the two sections combined? Explain What can you say with certainty about the value of the overall mean for the two sections combined? Without seeing all of the individual students’ exam scores, what information would you need to be able to calculate the overall mean? Suppose that section 1 contains 20 students and section 2 contains 30 students. Calculate the overall mean exam score. Is the overall mean closer to 60 or 90? Give an example of sample sizes for the two sections for which the overall mean turns out to be less than 65. If you do not know the number of students in the sections but do know that there is the same number of students in the 2 sections, can you determine the overall mean? Explain how it could happen that a student could transfer from section 1 to section 2 and cause the mean score for each section to decrease.

Measuring Center: The Median Another common measure of center is the median. The median describes the midpoint of a distribution. The median is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: Arrange all observations from smallest to largest. If the number of observations n is odd, the median is the center observation in the ordered list. If the number of observations n is even, the median is the average of the two center observations in the ordered list.

Finding the Center: The Median The median is the value with exactly half the data values below it and half above it. It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. It has the same units as the data.

Mean Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance; the median is the equal area point.

Measures of Central Tendency Mode – the observation that occurs the most often Can be more than one mode If all values occur only once – there is no mode Not used as often as mean & median

Another Measure of Center As a measure of center, the midrange may also be used (the average of the minimum and maximum values). However it is very sensitive to skewed distributions and outliers. The median is a more reasonable choice for center than the midrange in skewed distributions.

Run 1-Vars Stats on your list Using the calculator . . . Enter the data in a list Go to LIST Menu Highlight MATH Find your function OR Go to Stat Menu Highlight Calc Run 1-Vars Stats on your list

Describing Quantitative Data Measuring Center Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. Describing Quantitative Data 10 30 5 25 40 20 15 85 65 60 45 0 5 1 005555 2 0005 3 00 4 005 5 6 005 7 8 5 Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median. The numbers are in order & n is odd – so find the middle observation. The median is 4 lollipops! 2 3 4 8 12

2 3 4 6 8 12 Now suppose we had 6 customers who bought the following number of lollipops 2 3 4 6 8 12 Find the mean and median number of lollipops

2 3 4 6 8 20 5 7.17 The median is . . . The mean is . . . What would happen to the median & mean if the 12 lollipops were 20? 5 The median is . . . 7.17 The mean is . . . What happened? 2 3 4 6 8 20

2 3 4 6 8 50 5 12.17 The median is . . . The mean is . . . What would happen to the median & mean if the 20 lollipops were 50? 5 The median is . . . 12.17 The mean is . . . What happened? 2 3 4 6 8 50

Resistant - Statistics that are not affected by extreme values (outliers) Is the median resistant? YES Is the mean resistant? NO

Comparing the mean and the median The mean and the median are the same only if the distribution is symmetrical. Even in a skewed distribution, the median remains at the center point, the mean however, is pulled in the direction of the skew. Mean and median for a symmetric distribution Mean Median Mean and median for skewed distributions Left skew Mean Median Right skew Mean Median

Trimmed mean: To calculate a trimmed mean: Multiply the % to trim by n Truncate that many observations from BOTH ends of the distribution (when listed in order) Calculate the mean with the shortened data set

So remove one observation from each side! First find the mean of the data then find a 10% trimmed mean with the following data. 12 14 19 20 22 24 25 26 26 55 10%(10) = 1 So remove one observation from each side!

WEIGHTED MEAN Midterm --- 92 Paper ---- 80 Final --- 88 Find your semester average if the Midterm is weighed 25%, the paper 25% & the Final 50% .25(92) + .25(80) + .5(88) =

WEIGHTED MEAN Weighted Mean is an average computed by giving different weights to some of the individual values. If all the weights are equal, then the weighted mean is the same as the arithmetic mean. x is each data value w is the number of occurrences of x (weight) x̄ is the weighted mean

Variability

NOW TAKE A LOOK AT COMPARING THE DOT PLOTS CONSIDER THE FOLLOWING 3 SAMPLE DATA SETS: I 20 40 50 30 60 70 II 47 43 44 46 20 70 III 44 43 40 50 46 47 COMPUTE THE RANGE, MEDIAN & MEAN FOR EACH DATA SET WHAT DO YOU NOTICE??? NOW TAKE A LOOK AT COMPARING THE DOT PLOTS

Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability When describing data, never rely on center alone Like Measures of Center, you must choose the most appropriate measure of spread.

Measures of Variability range (max-min) interquartile range (Q3-Q1) deviations variance standard deviation Lower case Greek letter sigma

Describing Quantitative Data Measuring Spread: The Interquartile Range (IQR) A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. Describing Quantitative Data How to Calculate the Quartiles and the Interquartile Range To calculate the quartiles: Arrange the observations in increasing order and locate the median M. The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 – Q1

Describing Quantitative Data Find and Interpret the IQR Describing Quantitative Data Travel times to work for 20 randomly selected New Yorkers 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 5 10 15 20 25 30 40 45 60 65 85 Q1 = 15 M = 22.5 Q3= 42.5 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

Describing Quantitative Data Identifying Outliers In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. Describing Quantitative Data Definition: The 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Example, page 57 In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and IQR=27.5 minutes. For these data, 1.5 x IQR = 1.5(27.5) = 41.25 Q1 - 1.5 x IQR = 15 – 41.25 = -26.25 Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75 Any travel time shorter than -26.25 minutes or longer than 83.75 minutes is considered an outlier. 0 5 1 005555 2 0005 3 00 4 005 5 6 005 7 8 5

The Five-Number Summary The minimum and maximum values alone tell us little about the distribution as a whole. Likewise, the median and quartiles tell us little about the tails of a distribution. To get a quick summary of both center and spread, combine all five numbers. Describing Quantitative Data Definition: The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum

Boxplots (Box-and-Whisker Plots) The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. How To Make A Boxplot: A central box is drawn from the first quartile (Q1) to the third quartile (Q3). A line in the box marks the median. Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol such as an asterisk (*).

Recall, this is an outlier by the Construct a Boxplot Consider our New York travel time data: 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 Min=5 Q1 = 15 Median = 22.5 Q3= 42.5 Max=85 Recall, this is an outlier by the 1.5 x IQR rule

Measuring Spread: The Standard Deviation When we use the mean as our measure of center, we need another measure of spread. The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation. Consider the following data on the number of pets owned by a group of 9 children. Calculate the mean. Calculate each deviation. deviation = observation – mean deviation: 1 - 5 = - 4 deviation: 8 - 5 = 3 = 5

Measuring Spread: The Standard Deviation xi (xi-mean) (xi-mean)2 1 1 - 5 = -4 (-4)2 = 16 3 3 - 5 = -2 (-2)2 = 4 4 4 - 5 = -1 (-1)2 = 1 5 5 - 5 = 0 (0)2 = 0 7 7 - 5 = 2 (2)2 = 4 8 8 - 5 = 3 (3)2 = 9 9 9 - 5 = 4 (4)2 = 16 Sum=? 3) Square each deviation. 4) Find the “average” squared deviation. Calculate the sum of the squared deviations divided by (n-1)…this is called the variance. 5) Calculate the square root of the variance…this is the standard deviation. “average” squared deviation = 52/(9-1) = 6.5 This is the variance. Standard deviation = square root of variance =

Measuring Spread: The Standard Deviation The standard deviation sx measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. The average squared distance is called the variance. df

Degrees of Freedom (df) n deviations contain (n - 1) independent pieces of information about variability

ENTER DATA IN L1 1-Vars Stats on L1 or use List menu option Using a Calculator: ENTER DATA IN L1 1-Vars Stats on L1 or use List menu option

Which measure(s) of variability is/are resistant? IQR

Describing Quantitative Data Choosing Measures of Center and Spread We now have a choice between two descriptions for center and spread Mean and Standard Deviation Median and Interquartile Range Describing Quantitative Data The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA!

COEFFICIENT OF VARIATION: a measurement of the relative variability (or consistency) of data

CV is used to compare variability or consistency A sample of newborn infants had a mean weight of 6.2 pounds with a standard deviation of 1 pound. A sample of three-month-old children had a mean weight of 10.5 pounds with a standard deviation of 1.5 pounds. Which (newborns or 3-month-olds) are more variable in weight?

To compare variability, compare Coefficient of Variation For newborns: For 3-month-olds: Higher CV: more variable CV = 16% CV = 14% Lower CV: more consistent

Use Coefficient of Variation To compare two groups of data, to answer: Which is more consistent? Which is more variable?

Chapter 1 Summary Data Analysis is the art of describing data in context using graphs and numerical summaries. The purpose is to describe the most important features of a dataset.