Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Center.

Similar presentations


Presentation on theme: "Measures of Center."— Presentation transcript:

1 Measures of Center

2 Measuring Center: The Mean
The most common measure of center is the ordinary arithmetic average, or mean. To find the mean (pronounced “x-bar”) of a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, x3, …, xn, their mean is: In mathematics, the capital Greek letter Σ is short for “add them all up.” Therefore, the formula for the mean can be written in more compact notation:

3 SUPPOSE THAT AN INSTRUCTOR IS TEACHING TWO SECTIONS OF A COURSE AND THAT SHE CALCULATES THE MEAN EXAM SCORE TO BE 60 FOR SECTION 1 AND 90 FOR SECTION 2 Do you have enough information to determine the mean exam score for the two sections combined? Explain What can you say with certainty about the value of the overall mean for the two sections combined? Without seeing all of the individual students’ exam scores, what information would you need to be able to calculate the overall mean? Suppose that section 1 contains 20 students and section 2 contains 30 students. Calculate the overall mean exam score. Is the overall mean closer to 60 or 90? Give an example of sample sizes for the two sections for which the overall mean turns out to be less than 65. If you do not know the number of students in the sections but do know that there is the same number of students in the 2 sections, can you determine the overall mean? Explain how it could happen that a student could transfer from section 1 to section 2 and cause the mean score for each section to decrease.

4 Measuring Center: The Median
Another common measure of center is the median. The median describes the midpoint of a distribution. The median is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: Arrange all observations from smallest to largest. If the number of observations n is odd, the median is the center observation in the ordered list. If the number of observations n is even, the median is the average of the two center observations in the ordered list.

5 Finding the Center: The Median
The median is the value with exactly half the data values below it and half above it. It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. It has the same units as the data.

6 Mean Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance; the median is the equal area point.

7 Measures of Central Tendency
Mode – the observation that occurs the most often Can be more than one mode If all values occur only once – there is no mode Not used as often as mean & median

8 Another Measure of Center
As a measure of center, the midrange may also be used (the average of the minimum and maximum values). However it is very sensitive to skewed distributions and outliers. The median is a more reasonable choice for center than the midrange in skewed distributions.

9 Run 1-Vars Stats on your list
Using the calculator . . . Enter the data in a list Go to LIST Menu Highlight MATH Find your function OR Go to Stat Menu Highlight Calc Run 1-Vars Stats on your list

10 Describing Quantitative Data
Measuring Center Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. Describing Quantitative Data 10 30 5 25 40 20 15 85 65 60 45 0 5 3 00 5 7 8 5 Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

11 Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median. The numbers are in order & n is odd – so find the middle observation. The median is 4 lollipops!

12 2 3 4 6 8 12 Now suppose we had 6 customers who bought
the following number of lollipops Find the mean and median number of lollipops

13 2 3 4 6 8 20 5 7.17 The median is . . . The mean is . . .
What would happen to the median & mean if the 12 lollipops were 20? 5 The median is . . . 7.17 The mean is . . . What happened?

14 2 3 4 6 8 50 5 12.17 The median is . . . The mean is . . .
What would happen to the median & mean if the 20 lollipops were 50? 5 The median is . . . 12.17 The mean is . . . What happened?

15 Resistant - Statistics that are not affected by extreme values (outliers) Is the median resistant? YES Is the mean resistant? NO

16 Comparing the mean and the median
The mean and the median are the same only if the distribution is symmetrical. Even in a skewed distribution, the median remains at the center point, the mean however, is pulled in the direction of the skew. Mean and median for a symmetric distribution Mean Median Mean and median for skewed distributions Left skew Mean Median Right skew Mean Median

17 Trimmed mean: To calculate a trimmed mean: Multiply the % to trim by n
Truncate that many observations from BOTH ends of the distribution (when listed in order) Calculate the mean with the shortened data set

18 So remove one observation from each side!
First find the mean of the data then find a 10% trimmed mean with the following data. 10%(10) = 1 So remove one observation from each side!

19 WEIGHTED MEAN Midterm --- 92 Paper ---- 80 Final --- 88
Find your semester average if the Midterm is weighed 25%, the paper 25% & the Final 50% .25(92) + .25(80) + .5(88) =

20 WEIGHTED MEAN Weighted Mean is an average computed by giving different weights to some of the individual values. If all the weights are equal, then the weighted mean is the same as the arithmetic mean. x is each data value w is the number of occurrences of x (weight) x̄ is the weighted mean

21 Variability

22 NOW TAKE A LOOK AT COMPARING THE DOT PLOTS
CONSIDER THE FOLLOWING 3 SAMPLE DATA SETS: I II III COMPUTE THE RANGE, MEDIAN & MEAN FOR EACH DATA SET WHAT DO YOU NOTICE??? NOW TAKE A LOOK AT COMPARING THE DOT PLOTS

23 Why is the study of variability important?
Allows us to distinguish between usual & unusual values In some situations, want more/less variability When describing data, never rely on center alone Like Measures of Center, you must choose the most appropriate measure of spread.

24 Measures of Variability
range (max-min) interquartile range (Q3-Q1) deviations variance standard deviation Lower case Greek letter sigma

25 Describing Quantitative Data
Measuring Spread: The Interquartile Range (IQR) A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. Describing Quantitative Data How to Calculate the Quartiles and the Interquartile Range To calculate the quartiles: Arrange the observations in increasing order and locate the median M. The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 – Q1

26 Describing Quantitative Data
Find and Interpret the IQR Describing Quantitative Data Travel times to work for 20 randomly selected New Yorkers 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 5 10 15 20 25 30 40 45 60 65 85 Q1 = 15 M = 22.5 Q3= 42.5 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

27 Describing Quantitative Data
Identifying Outliers In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. Describing Quantitative Data Definition: The 1.5 x IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Example, page 57 In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and IQR=27.5 minutes. For these data, 1.5 x IQR = 1.5(27.5) = 41.25 Q x IQR = 15 – = Q x IQR = = 83.75 Any travel time shorter than minutes or longer than minutes is considered an outlier. 0 5 3 00 5 7 8 5

28 The Five-Number Summary
The minimum and maximum values alone tell us little about the distribution as a whole. Likewise, the median and quartiles tell us little about the tails of a distribution. To get a quick summary of both center and spread, combine all five numbers. Describing Quantitative Data Definition: The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum

29 Boxplots (Box-and-Whisker Plots)
The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. How To Make A Boxplot: A central box is drawn from the first quartile (Q1) to the third quartile (Q3). A line in the box marks the median. Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol such as an asterisk (*).

30 Recall, this is an outlier by the
Construct a Boxplot Consider our New York travel time data: 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 Min=5 Q1 = 15 Median = 22.5 Q3= 42.5 Max=85 Recall, this is an outlier by the 1.5 x IQR rule

31 Measuring Spread: The Standard Deviation
When we use the mean as our measure of center, we need another measure of spread. The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation. Consider the following data on the number of pets owned by a group of 9 children. Calculate the mean. Calculate each deviation. deviation = observation – mean deviation: = - 4 deviation: = 3 = 5

32 Measuring Spread: The Standard Deviation
xi (xi-mean) (xi-mean)2 1 1 - 5 = -4 (-4)2 = 16 3 3 - 5 = -2 (-2)2 = 4 4 4 - 5 = -1 (-1)2 = 1 5 5 - 5 = 0 (0)2 = 0 7 7 - 5 = 2 (2)2 = 4 8 8 - 5 = 3 (3)2 = 9 9 9 - 5 = 4 (4)2 = 16 Sum=? 3) Square each deviation. 4) Find the “average” squared deviation. Calculate the sum of the squared deviations divided by (n-1)…this is called the variance. 5) Calculate the square root of the variance…this is the standard deviation. “average” squared deviation = 52/(9-1) = This is the variance. Standard deviation = square root of variance =

33 Measuring Spread: The Standard Deviation
The standard deviation sx measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. The average squared distance is called the variance. df

34 Degrees of Freedom (df)
n deviations contain (n - 1) independent pieces of information about variability

35 ENTER DATA IN L1 1-Vars Stats on L1 or use List menu option
Using a Calculator: ENTER DATA IN L1 1-Vars Stats on L1 or use List menu option

36 Which measure(s) of variability is/are resistant?
IQR

37 Describing Quantitative Data
Choosing Measures of Center and Spread We now have a choice between two descriptions for center and spread Mean and Standard Deviation Median and Interquartile Range Describing Quantitative Data The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA!

38 COEFFICIENT OF VARIATION:
a measurement of the relative variability (or consistency) of data

39 CV is used to compare variability or consistency
A sample of newborn infants had a mean weight of 6.2 pounds with a standard deviation of 1 pound. A sample of three-month-old children had a mean weight of 10.5 pounds with a standard deviation of 1.5 pounds. Which (newborns or 3-month-olds) are more variable in weight?

40 To compare variability, compare Coefficient of Variation
For newborns: For 3-month-olds: Higher CV: more variable CV = 16% CV = 14% Lower CV: more consistent

41 Use Coefficient of Variation
To compare two groups of data, to answer: Which is more consistent? Which is more variable?

42 Chapter 1 Summary Data Analysis is the art of describing data in context using graphs and numerical summaries. The purpose is to describe the most important features of a dataset.


Download ppt "Measures of Center."

Similar presentations


Ads by Google