Chapter 3 Numerically Summarizing Data

Slides:



Advertisements
Similar presentations
Class Session #2 Numerically Summarizing Data
Advertisements

Numerically Summarizing Data
© 2010 Pearson Prentice Hall. All rights reserved Numerical Descriptions of Data.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Slides by JOHN LOUCKS St. Edward’s University.
© 2010 Pearson. All rights reserved. 1 Chapter 3 Numerically Summarizing Data Insert photo of cover.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter Numerically Summarizing Data © 2010 Pearson Prentice Hall. All rights reserved 3 3.
The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set divided by the number of observations.
Measures of Central Tendency
Describing Data: Numerical
Chapter 3 Descriptive Measures
3.1 Measures of Central Tendency. Ch. 3 Numerically Summarizing Data The arithmetic mean of a variable is computed by determining the sum of all the values.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Chapter 3 – Descriptive Statistics
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)

© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 3 Descriptive Measures
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
MATH125 Chapter 3 topics ANALYZING DATA NUMERICALLY.
3 3 Chapter Numerically Summarizing Data
Chapter 3 Numerically Summarizing Data Insert photo of cover.
Section 3.2 Measures of Dispersion 1.Range 2.Variance 3.Standard deviation 4.Empirical Rule for bell shaped distributions 5.Chebyshev’s Inequality for.
3.2 Measures of Dispersion. D ATA ● Comparing two sets of data ● The measures of central tendency (mean, median, mode) measure the differences between.
Statistics Numerical Representation of Data Part 2 – Measure of Variation.
3.1 Measures of Central Tendency. Ch. 3 Numerically Summarizing Data The arithmetic mean of a variable is computed by determining the sum of all the values.
Chapter Numerically Summarizing Data © 2010 Pearson Prentice Hall. All rights reserved 3 3.
Chapter 3 Numerically Summarizing Data 3.2 Measures of Dispersion.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Chapter 3, Part A Descriptive Statistics: Numerical Measures n Measures of Location n Measures of Variability.
1 Chapter 4 Numerical Methods for Describing Data.
1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3.
Summary Statistics. One of the main purposes of statistics is to draw conclusions about a (usually large) population from a (usually small) sample of.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
Honors Statistics Chapter 3 Measures of Variation.
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Numerical Measures Chapter 3.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Numerically Summarizing Data 3.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Section 3.1 Measures of Central Tendency
Numerically Summarizing Data
Numerically Summarizing Data
Measures of Dispersion
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerically Summarizing Data
Numerical Descriptive Measures
Chapter Three Numerically Summarizing Data
St. Edward’s University
Numerical Descriptive Measures
Presentation transcript:

Chapter 3 Numerically Summarizing Data 3.1 Measures of Central Tendency

The following chart gives a summary of some background information on 5 students at Joliet Junior College (JJC). Name Age Gender Number of Semesters Overall GPA Completed At JJC Jennifer 21 Female 1 3.5 Amy 19 Female 2 2.75 Brian 18 Male 4 3.25 Mark 18 Male 2 3.0 Jim 24 Male 3 4.0 Which of the above data would be qualitative? Answer:Gender

A parameter is a descriptive measure of a population. A statistic is a descriptive measure of a sample. A statistic is an unbiased estimator of a parameter if it does not consistently over- or underestimate the parameter.

The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set divided by the number of observations.

The population arithmetic mean, is computed using all the individuals in a population. The population mean is a parameter. The population arithmetic mean is denoted by

The sample arithmetic mean, is computed using sample data. The sample mean is a statistic that is an unbiased estimator of the population mean. The sample arithmetic mean is denoted by

EXAMPLE Computing a Population Mean and a Sample Mean The following chart gives a summary of some background information on a Calculus class. Name Age Gender Overall GPA Jennifer 21 Female 3.5 Amy 19 Female 2.75 Brian 25 Male 3.95 Jane 19 Female 2.75 Mark 18 Male 3.0 Julie 19 Female 3.85 Jim 24 Male 4.0 Ted 25 Male 3.7 Michel 19 Female 3.75 Amanda 19 Female 3.65 Linda 19 Female 4.0

Compute the Arithmetic Mean Treat the students in this class as a population. Compute the population mean of the GPA. Then take a simple random sample of n = 5 students. Compute the sample mean of the GPA. Obtain a second simple random sample of n = 5 students. Again compute the sample mean of the GPA.

The population(size of 11) mean 3.5+2.75+ 3.95+2.75+ 3.0+ 3.85+ 4.0 + 3.7+ 3.75+ 3.65+ 4.0=34.9

The median of a variable is the value that lies in the middle of the data when arranged in ascending order. That is, half the data is below the median and half the data is above the median. We use M to represent the median.

EXAMPLE Computing the Median of Data Find the population median of the total GPA from the earlier example. 2.75 2.75 3.0 3.5 3.95 3.65 3.7 3.75 3.85 4.0 4.0 1 2 3 4 5 6 7 8 9 10 11

The mode of a variable is the most frequent observation of the variable that occurs in the data set. If there is no observation that occurs with the most frequency, we say the data has no mode.

EXAMPLE Finding the Mode of a Data Set The data on the next slide represent the Vice Presidents of the United States and their state of birth. Find the mode.

The mode is New York.

The arithmetic mean is sensitive to extreme (very large or small) values in the data set, while the median is not. We say the median is resistant to extreme values, but the arithmetic mean is not.

When data sets have unusually large or small values relative to the entire set of data or when the distribution of the data is skewed, the median is the preferred measure of central tendency over the arithmetic mean because it is more representative of the typical observation.

EXAMPLE. Identifying the Shape of the Distribution EXAMPLE Identifying the Shape of the Distribution Based on the Mean and Median The following data represent the asking price of homes for sale in Lincoln, NE. Source: http://www.homeseekers.com

Find the mean and median Find the mean and median. Use the mean and median to identify the shape of the distribution. Verify your result by drawing a histogram of the data.

Find the mean and median Find the mean and median. Use the mean and median to identify the shape of the distribution. Verify your result by drawing a histogram of the data. Using MINITAB/Excel/Spss, we find that the mean asking price is $143,509 and the median asking price is $131,825. Therefore, we would conjecture that the distribution is skewed right.

3.2 Measures of Dispersion

To order food at a McDonald’s Restaurant, one must choose from multiple lines, while at Wendy’s Restaurant, one enters a single line. The following data represent the wait time (in minutes) in line for a simple random sample of 30 customers at each restaurant during the lunch hour. For each sample, answer the following: (a) What was the mean wait time? (b) Draw a histogram of each restaurant’s wait time. (c ) Which restaurant’s wait time appears more dispersed? Which line would you prefer to wait in? Why?

Wait Time at McDonald’s Wait Time at Wendy’s 1.50 0.79 1.01 1.66 0.94 0.67 2.53 1.20 1.46 0.89 0.95 0.90 1.88 2.94 1.40 1.33 1.20 0.84 3.99 1.90 1.00 1.54 0.99 0.35 0.90 1.23 0.92 1.09 1.72 2.00 Wait Time at McDonald’s 3.50 0.00 0.38 0.43 1.82 3.04 0.00 0.26 0.14 0.60 2.33 2.54 1.97 0.71 2.22 4.54 0.80 0.50 0.00 0.28 0.44 1.38 0.92 1.17 3.08 2.75 0.36 3.10 2.19 0.23

The mean wait time in each line is 1.39 minutes.

The range, R, of a variable is the difference between the largest data value and the smallest data values. That is Range = R = Largest Data Value – Smallest Data Value

EXAMPLE Finding the Range of a Set of Data Find the range of the student GPA collected from Section 3.1

The population variance of a variable is the sum of squared deviations about the population mean divided by the number of observations in the population, N.

The population variance is symbolically represented by lower case Greek sigma squared. Note: When using the above formula, do not round until the last computation. Use as many decimals as allowed by your calculator in order to avoid round off errors.

EXAMPLE Computing a Population Variance Compute the population variance of the population data collected in Section 3.1.

The sample variance is computed by determining the sum of squared deviations about the sample mean and then dividing this result by n – 1.

Note: Whenever a statistic consistently overestimates or underestimates a parameter, it is called biased. To obtain an unbiased estimate of the population variance, we divide the sum of the squared deviations about the mean by n - 1.

EXAMPLE Computing a Sample Variance Compute the sample variance using the sample data from Section 3.1

The population standard deviation is denoted by It is obtained by taking the square root of the population variance, so that

EXAMPLE Computing a Population Standard Deviation and Sample Standard Deviation Compute the population and sample standard deviation for the data obtained in Section 3.1

EXAMPLE Comparing Standard Deviations Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why?

EXAMPLE Comparing Standard Deviations Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why? Sample standard deviation for Wendy’s: 0.738 minutes Sample standard deviation for McDonald’s: 1.265 minutes

EXAMPLE Using the Empirical Rule The following data represent the serum HDL cholesterol of the 54 female patients of a family doctor. 41 48 43 38 35 37 44 44 44 62 75 77 58 82 39 85 55 54 67 69 69 70 65 72 74 74 74 60 60 60 61 62 63 64 64 64 54 54 55 56 56 56 57 58 59 45 47 47 48 48 50 52 52 53

(a) Compute the population mean and standard deviation. (b) Draw a histogram to verify the data is bell-shaped. (c) Determine the percentage of patients that have serum HDL within 3 standard deviations of the mean according to the Empirical Rule. (d) Determine the percentage of patients that have serum HDL between 34 and 80.8 according to the Empirical Rule. (e) Determine the actual percentage of patients that have serum HDL between 34 and 80.8.

(a) Using a TI83 plus graphing calculator, we find (b)

(c) According to the Empirical Rule, approximately 99 (c) According to the Empirical Rule, approximately 99.7% of the patients will have serum HDL cholesterol levels within 3 standard deviations of the mean. That is, approximately 99.7% of the patients will have serum HDL cholesterol levels greater than or equal to 57.4 - 3(11.7) = 22.3 and less than or equal to 57.4 + 3(11.7) = 92.5.

(d) Because 33.8 is 2 standard deviations below the mean (57.4 - 2(11.7) = 34) and 81 is 2 standard deviations above the mean (57.4 + 2(11.7) = 80.8), the Empirical Rule states that approximately 95% of the data will lie between 34 and 80.8. (e) There are no observations below 34. There are 2 observations greater than 80.8. Therefore, 52/54 = 96.3% of the data lie between 34 and 80.8.

EXAMPLE Using Chebyshev’s Theorem Using the data from the previous example, use Chebyshev’s Theorem to (a) determine the percentage of patients that have serum HDL within 3 standard deviations of the mean. (b) determine the percentage of patients that have serum HDL between 34 and 80.8.

Answer: (a) (1-1/9)*100%=88.9% (b) 57.4-34=23.4 80.8-57.4=23.4 23.4/11.7=2 two standard deviations, so the percentage of patients that have serum HDL between two stand deviations is at least (1-1/4)*100%=75%