© The McGraw-Hill Companies, Inc., 2000 3-1 Chapter 3 Data Description.

Slides:



Advertisements
Similar presentations
Chapter 3 Data Description
Advertisements

Measures of Dispersion
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Descriptive Statistics
Calculating & Reporting Healthcare Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-1.
Slides by JOHN LOUCKS St. Edward’s University.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Describing Data Using Numerical Measures
1 Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. C H A P T E R T H R E E DATA DESCRIPTION.
Chapter 3 Data Description 1 Copyright © 2012 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 3 – Descriptive Statistics
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Unit 3 Sections 3-1 & 3-2. What we will be able to do throughout this chapter…  Use statistical methods to summarize data  The most familiar method.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
CHAPTER 3 Data Description. OUTLINE 3-1Introduction 3-2Measures of Central Tendency 3-3Measures of Variation 3-4Measures of Position 3-5Exploratory Data.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Summary Statistics: Measures of Location and Dispersion.
Chapter 3 Data Description Section 3-2 Measures of Central Tendency.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Chapter 3 Data Description 1 © McGraw-Hill, Bluman, 5 th ed, Chapter 3.
CHAPTER 3 DATA DESCRIPTION © MCGRAW-HILL, BLUMAN, 5 TH ED, CHAPTER 3 1.
Data Description Note: This PowerPoint is only a summary and your main source should be the book. Lecture (8) Lecturer : FATEN AL-HUSSAIN.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 3 Section 3 Measures of variation. Measures of Variation Example 3 – 18 Suppose we wish to test two experimental brands of outdoor paint to see.
CHAPTET 3 Data Description. O UTLINE : Introduction. 3-1 Measures of Central Tendency. 3-2 Measures of Variation. 3-3 Measures of Position. 3-4 Exploratory.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
DATA DESCRIPTION C H A P T E R T H R E E
Data Description Chapter(3) Lecture8)
© McGraw-Hill, Bluman, 5th ed, Chapter 3
Describing, Exploring and Comparing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Copyright © 2012 The McGraw-Hill Companies, Inc.
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
CHAPTET 3 Data Description.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Chapter 3 Data Description
Chapter 3: Data Description
Presentation transcript:

© The McGraw-Hill Companies, Inc., Chapter 3 Data Description

© The McGraw-Hill Companies, Inc., Objectives

3-3 Objectives Summarize data using the measures of central tendency. Describe data using the measures of variation. Identify the position of a data value in a data set using measures of position. Identify outliers or extreme values. Use the techniques of exploratory data analysis (box plot).

© The McGraw-Hill Companies, Inc., Measures of Central Tendency Review: statistic A statistic is a characteristic or measure obtained by using the data from a sample. parameter A parameter is a characteristic or measure obtained by using the data from a specific population.

© The McGraw-Hill Companies, Inc., The Mean (arithmetic average) mean nN The mean is defined to be the sum (  ) of the data values divided by the total number of values (called n or N). X  We will compute two means: one for the sample (X) and one for a finite population of values (  ). The mean, in most cases, is not an actual data value.

© The McGraw-Hill Companies, Inc., The Sample Mean

© The McGraw-Hill Companies, Inc., The Sample Mean - The Sample Mean - Example

© The McGraw-Hill Companies, Inc., The Population Mean

© The McGraw-Hill Companies, Inc., The Population Mean- The Population Mean - Example

© The McGraw-Hill Companies, Inc., The Sample Mean for a Grouped Frequency Distribution Themeanforagroupedfrequency distributuionisgivenby X fX n HereX is the correspond ing classmidpoint. m m = ()  .

© The McGraw-Hill Companies, Inc., The Sample Mean for a Grouped Frequency Distribution - The Sample Mean for a Grouped Frequency Distribution - Example ClassFrequency, f Class Frequency, f

© The McGraw-Hill Companies, Inc., The Sample Mean for a Grouped Frequency Distribution - The Sample Mean for a Grouped Frequency Distribution - Example ClassX  X Frequency, f m f m

© The McGraw-Hill Companies, Inc., The Sample Mean for a Grouped Frequency Distribution - The Sample Mean for a Grouped Frequency Distribution - Example

© The McGraw-Hill Companies, Inc., The Median data array When a data set is ordered, it is called a data array. median The median is defined to be the midpoint of the data array. MD The symbol used to denote the median is MD.

© The McGraw-Hill Companies, Inc., The Median - The Median - Example The weights (in pounds) of seven army recruits are 180, 201, 220, 191, 219, 209, and 186. Find the median. Arrange the data in order and select the middle point.

© The McGraw-Hill Companies, Inc., The Median - The Median - Example Data array: Data array: 180, 186, 191, 201, 209, 219, 220. The median, MD = 201.

© The McGraw-Hill Companies, Inc., The Median odd number In the previous example, there was an odd number of values in the data set. In this case it is easy to select the middle number in the data array.

© The McGraw-Hill Companies, Inc., The Median even number average of the two middle numbers When there is an even number of values in the data set, the median is obtained by taking the average of the two middle numbers.

© The McGraw-Hill Companies, Inc., The Median - The Median - Example Six customers purchased the following number of magazines: 1, 7, 3, 2, 3, 4. Find the median. Arrange the data in order and compute the middle point. Data array: 1, 2, 3, 3, 4, 7. The median, MD = (3 + 3)/2 = 3.

© The McGraw-Hill Companies, Inc., The Median - The Median - Example The ages of 10 college students are: 18, 24, 20, 35, 19, 23, 26, 23, 19, 20. Find the median. Arrange the data in order and compute the middle point.

© The McGraw-Hill Companies, Inc., The Median - The Median - Example Data array:2023 Data array: 18, 19, 19, 20, 20, 23, 23, 24, 26, 35. MD = ( )/2 = The median, MD = ( )/2 = 21.5.

© The McGraw-Hill Companies, Inc., The Mode mode The mode is defined to be the value that occurs most often in a data set. A data set can have more than one mode. no mode A data set is said to have no mode if all values occur with equal frequency.

© The McGraw-Hill Companies, Inc., The Mode - The Mode - Examples The following data represent the duration (in days) of U.S. space shuttle voyages for the years Find the mode. Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11. Mode = 8 Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14. Mode = 8.

© The McGraw-Hill Companies, Inc., The Mode - The Mode - Examples Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode. Data set: 2, 3, 5, 7, 8, 10. no mode There is no mode since each data value occurs equally with a frequency of one.

© The McGraw-Hill Companies, Inc., The Mode - The Mode - Examples Eleven different automobiles were tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode. Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. two modes (bimodal) 1824Why? There are two modes (bimodal). The values are 18 and 24. Why?

© The McGraw-Hill Companies, Inc., The Midrange midrange The midrange is found by adding the lowest and highest values in the data set and dividing by 2. The midrange is a rough estimate of the middle value of the data. MR The symbol that is used to represent the midrange is MR.

© The McGraw-Hill Companies, Inc., The Midrange - The Midrange - Example MR = (1 + 8)/2 = 4.5 Last winter, the city of Brownsville, Minnesota, reported the following number of water-line breaks per month. The data is as follows: 2, 3, 6, 8, 4, 1. Find the midrange. MR = (1 + 8)/2 = 4.5. Note: Note: Extreme values influence the midrange and thus may not be a typical description of the middle.

© The McGraw-Hill Companies, Inc., The Weighted Mean weighted mean The weighted mean is used when the values in a data set are not all equally represented. weighted meanof a variable X The weighted mean of a variable X is found by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights.

© The McGraw-Hill Companies, Inc., The Weighted Mean

© The McGraw-Hill Companies, Inc., Distribution Shapes Frequency distributions can assume many shapes. positively skewed symmetricalnegatively skewed The three most important shapes are positively skewed, symmetrical, and negatively skewed.

© The McGraw-Hill Companies, Inc., Positively Skewed X Y Mode < Median < Mean Positively Skewed

© The McGraw-Hill Companies, Inc., 2000 n 3-51 Symmetrical Y X Symmetrical Mean = Media = Mode

3-52 Negatively Skewed < Median < Mode

© The McGraw-Hill Companies, Inc., Measures of Variation - Range range R The range is defined to be the highest value minus the lowest value. The symbol R is used for the range. R = highest value – lowest value. R = highest value – lowest value. Extremely large or extremely small data values can drastically affect the range.

© The McGraw-Hill Companies, Inc., Measures of Variation – Population Variance

© The McGraw-Hill Companies, Inc., Measures of Variation – Population Standard Deviation

© The McGraw-Hill Companies, Inc., Consider the following data to constitute the population: 10, 60, 50, 30, 40, 20. Find the mean and variance. mean  The mean  = ( )/6 = 210/6 = 35. variance  2 The variance  2 = 1750/6 = See next slide for computations. Measures of Variation - Measures of Variation - Example

© The McGraw-Hill Companies, Inc., Measures of Variation - Measures of Variation - Example

© The McGraw-Hill Companies, Inc., Measures of Variation – Sample Variance The unbiased estimator of thepopulation variance or the sample variance is a statisticwhose value approximates the expected value of apopulation variance. It is denoted by s 2, (), where s XX n and Xsamplemean nsamplesize = =    

© The McGraw-Hill Companies, Inc., Measures of Variation – Sample Standard Deviation The sample standarddeviationis the square root of the samplevariance. = 2 ss XX n     (). 2 1

© The McGraw-Hill Companies, Inc., for the Sample Variance and the Standard Deviation Shortcut Formula for the Sample Variance and the Standard Deviation = = XXn n s XXn n s       ()/ ()/

© The McGraw-Hill Companies, Inc., Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14.  X = = 79.  X 2 = = Sample Variance - Sample Variance - Example

© The McGraw-Hill Companies, Inc., 2000 = 1263  (79) = 3.7 = 2 s XXn n s      ()/ / Sample Variance - Sample Variance - Example

© The McGraw-Hill Companies, Inc., coefficient of variation The coefficient of variation is defined to be the standard deviation divided by the mean. The result is usually expressed as a percentage. Coefficient of Variation CVar s X orCVar  100%100%. =   Should NOT be used with interval level data.

© The McGraw-Hill Companies, Inc., The Empirical (Normal) Rule For any bell-shaped distribution: Approximately 68% of the data values will fall within one standard deviation of the mean. Approximately 95% will fall within two standard deviations of the mean. Approximately 99.7% will fall within three standard deviations of the mean.

© The McGraw-Hill Companies, Inc., The Empirical (Normal) Rule  -- 95%    

© The McGraw-Hill Companies, Inc., Measures of Position – z score standard scorez score The standard score or z score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. z z score The symbol z is used for the z score.

© The McGraw-Hill Companies, Inc., The z score represents the number of standard deviations a data value falls above or below the mean. Measures of Position – z-score Forsamples z XX s Forpopulations z X :. :.     

© The McGraw-Hill Companies, Inc., A student scored 65 on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score. z = (65 – 50)/10 = 1.5. above That is, the score of 65 is 1.5 standard deviations above the mean. Above - Above - since the z-score is positive. z-score - z-score - Example

© The McGraw-Hill Companies, Inc., Measures of Position – Percentiles Percentiles divide the distribution into 100 groups. P k The P k percentile is defined to be that numerical value such that at most k% of the values are smaller than P k and at most (100 – k)% are larger than P k in an ordered data set.

© The McGraw-Hill Companies, Inc., Percentile Formula The percentile corresponding to a given value (X) is computed by using the formula: Percentile number of values below X total number of values   100%

© The McGraw-Hill Companies, Inc., Percentiles - Example A teacher gives a 20-point test to 10 students. Find the percentile rank of a score of 12. Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10. Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. Percentile = [( )/10](100%) = 65th percentile. Student did better than 65% of the class.

© The McGraw-Hill Companies, Inc., Percentiles - Finding the value Corresponding to a Given Percentile Procedure: Procedure: Let p be the percentile and n the sample size. Step 1: Step 1: Arrange the data in order. Step 2:c = (n  p)/100 Step 2: Compute c = (n  p)/100. Step 3:c c c c+1 Step 3: If c is not a whole number, round up to the next whole number. If c is a whole number, use the value halfway between c and c+1.

© The McGraw-Hill Companies, Inc., Percentiles - Finding the value Corresponding to a Given Percentile Step 4: Step 4: The value of c is the position value of the required percentile. Example: Find the value of the 25th percentile for the following data set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. Note: Note: the data set is already ordered. n = 10, p = 25, so c = (10  25)/100 = 2.5. Hence round up to c = 3.

© The McGraw-Hill Companies, Inc., Percentiles - Finding the value Corresponding to a Given Percentile Thus, the value of the 25th percentile is the value X = 5. Find the 80th percentile. c = (10  80)/100 = 8. Thus the value of the 80th percentile is the average of the 8th and 9th values. Thus, the 80th percentile for the data set is ( )/2 = 16.5.

© The McGraw-Hill Companies, Inc., Special Percentiles – Deciles and Quartiles Deciles Deciles divide the data set into 10 groups. Deciles are denoted by D 1, D 2, …, D 9 with the corresponding percentiles being P 10, P 20, …, P 90 Quartiles Quartiles divide the data set into 4 groups.

© The McGraw-Hill Companies, Inc., Special Percentiles – Deciles and Quartiles Quartiles are denoted by Q 1, Q 2, and Q 3 with the corresponding percentiles being P 25, P 50, and P 75. The median is the same as P 50 or Q 2.

© The McGraw-Hill Companies, Inc., Outliers and the Interquartile Range (IQR) outlier An outlier is an extremely high or an extremely low data value when compared with the rest of the data values. Interquartile Range IQR = Q 3 – Q 1 The Interquartile Range, IQR = Q 3 – Q 1.

© The McGraw-Hill Companies, Inc., Outliers and the Interquartile Range (IQR) To determine whether a data value can be considered as an outlier: Step 1: Step 1: Compute Q 1 and Q 3. Step 2: Step 2: Find the IQR = Q 3 – Q 1. Step 3: Step 3: Compute (1.5)(IQR). Step 4: Step 4: Compute Q 1 – (1.5)(IQR) and Q 3 + (1.5)(IQR).

© The McGraw-Hill Companies, Inc., Outliers and the Interquartile Range (IQR) To determine whether a data value can be considered as an outlier: Step 5: Step 5: Compare the data value (say X) with Q 1 – (1.5)(IQR) and Q 3 + (1.5)(IQR). If X Q 3 + (1.5)(IQR), then X is considered an outlier.

© The McGraw-Hill Companies, Inc., Outliers and the Interquartile Range (IQR) - Outliers and the Interquartile Range (IQR) - Example Given the data set 5, 6, 12, 13, 15, 18, 22, 50, can the value of 50 be considered as an outlier? Verify Q 1 = 9, Q 3 = 20, IQR = 11. Verify. (1.5)(IQR) = (1.5)(11) = – 16.5 = – 7.5 and = The value of 50 is outside the range – 7.5 to 36.5, hence 50 is an outlier.

© The McGraw-Hill Companies, Inc., Exploratory Data Analysis – Box Plot box plot minimum valuelower hinge medianupper hinge maximum value When the data set contains a small number of values, a box plot is used to graphically represent the data set. These plots involve five values: the minimum value, the lower hinge, the median, the upper hinge, and the maximum value.

© The McGraw-Hill Companies, Inc., Exploratory Data Analysis – Box Plot Quartile 1 lower hinge. Quartile 1 is the lower hinge. It is the median of all values less than or equal to the median (quartile 2) when the data set has an odd number of values, or as the median of all values less than the median when the data set has an even number of values.

© The McGraw-Hill Companies, Inc., Exploratory Data Analysis – Box Plot Quartile 3 upper hinge. Quartile 3 is the upper hinge. It is median of all values greater than or equal to the median when the data set has an odd number of values, or as the median of all values greater than the median when the data set has an even number of values.

© The McGraw-Hill Companies, Inc., Exploratory Data Analysis - Box Plot - (cardiograms data) Exploratory Data Analysis - Box Plot - Example (cardiograms data)

© The McGraw-Hill Companies, Inc., Information Obtained from a Box Plot If the median is near the center of the box, the distribution is approximately symmetric. If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center of the box, the distribution is negatively skewed.

© The McGraw-Hill Companies, Inc., Information Obtained from a Box Plot If the lines are about the same length, the distribution is approximately symmetric. If the right line is larger than the left line, the distribution is positively skewed. If the left line is larger than the right line, the distribution is negatively skewed.