Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3: Descriptive Statistics. LO1 Apply various measures of central tendency— including the mean, median, and mode—to a set of ungrouped data. LO2Apply.

Similar presentations


Presentation on theme: "Chapter 3: Descriptive Statistics. LO1 Apply various measures of central tendency— including the mean, median, and mode—to a set of ungrouped data. LO2Apply."— Presentation transcript:

1 Chapter 3: Descriptive Statistics

2 LO1 Apply various measures of central tendency— including the mean, median, and mode—to a set of ungrouped data. LO2Apply various measures of variability—including the range, interquartile range, mean absolute deviation, variance, and standard deviation (using the empirical rule and Chebyshev’s theorem)—to a set of ungrouped data. LO3Compute the mean, median, mode, standard deviation, and variance of grouped data. LO4Describe a data distribution statistically and graphically using skewness, kurtosis, and box-and-whisker plots. LO5 Use computer packages to compute various measures of central tendency, variation, and shape on a set of data, as well as to describe the data distribution graphically. Learning Objectives

3 Ungrouped data is any array of numbers which have not been summarized by statistical techniques Measures of central tendency reveal information about the values at the center, or middle part, of a group of numbers (or ordered array) Common Measures of Central Tendency are the : – Mean – Median – Mode – Percentiles – Quartiles Measures of Central Tendency Ungrouped Data

4 The arithmetic mean is commonly called ‘the mean’ It is the average of a group of numbers It is a concept applicable for interval and ratio data It is not applicable for nominal or ordinal data The mean is computed by summing all values in the data set and dividing the sum by the number of values in the data set Thus, its value is affected by each value in the data set, including extreme values The Arithmetic Mean

5 As a summary statistic of central tendency in data produced by business and economic processes When used in these settings it is important to make the distinction between – The population mean: µ and the – Sample mean The population mean is based on all of the values within the population The sample mean only uses some of the values within a population Application of Arithmetic Mean in Statistics

6 Computing Population Mean Suppose a company has five departments with 24, 13, 19, 26, and 11 workers in each department. The population mean number of workers in each department is 18.6 workers. The computations follow:

7 Computing Sample Mean The calculation of a sample mean uses the same algorithm as for a population mean and will produce the same answer if computed on the same data. However, a separate symbol is necessary for the population mean and for the sample mean. Given the following set of numbers: 57, 86, 42, 38, 90, and 66. The sample mean is The computations follow:

8 The mean is the most commonly used measure of central tendency because of its mathematical properties and because it uses all the data point in the data set However, the mean is affected by extremely large or extremely small numbers Note that for the sample mean example, if the largest number 90 is replaced by the number 1,000 the mean becomes as opposed to If the smallest number 38 is replaced by the number 5 the mean becomes as opposed to Extreme values can significantly distort the mean. Impact of Extreme Values on the Mean

9 The median is the middle value in an ordered array of numbers The median applies for ordinal, interval, and ratio data Advantage of the median – it is unaffected by extremely large and extremely small values in the data set A disadvantage of the median is that not all the information from the numbers is used The Median

10 Computing the Median First Step – Arrange the observations in an ordered array Second Step – For an array with an odd number of terms, the median is the middle number. Third Step – For an array with an even number of terms, the median is the average of the two middle numbers. Locating the Median – The median’s location in an ordered array is found by (n+1)/2

11 Median Example with an Odd Number of Data Let X be an ordered array such that X has the following values: 3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21, 22 – There are 17 values in the ordered array – Position of median = (n+1)/2 = (17+1)/2 = 9 th position – Counting from left to right to the 9 th position, the median is 15 Advantage - extreme values do not distort the median – Note that if 22 (maximum value) is replaced by 100, the median is still 15 – If 3 (minimum value) is replaced by -103, the median is still 15

12 Median Example with an Even Number of Data Let X be an ordered array such that X assumes the following values: 3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21 – There are 16 values in the ordered array – Position of median = (n+1)/2 = (16+1)/2 = 8.5 th position – The median is a value between the 8th and 9th observations in the ordered array. The median is (15-14) = 14.5 or simply, (14+15)/2 =14.5 Advantage - extreme values do not distort the median – If 21 (maximum value) is replaced by 100, the median is still 14.5 – If 3 (minimum value) is replaced by -88, the median is still 14.5

13 The mode is the value that occurs most frequently in an array of data The mode applies to all levels of data measurement: nominal, ordinal, interval, and ratio Unimodal: describes data sets with a single mode Bimodal: describes data sets that have two modes Multimodal: describes data sets that contain more than two modes The Mode

14 Organizing the data into an ordered array helps to locate the mode The arrangement of the numbers represents an ordered array 44 is the value that occurs most frequently (occurs 5 times). The mode is 44 Example of the Mode

15 Percentiles are measures of central tendency that divide a group of data into 100 parts The nth percentile is the value such that at least n percent of the data are below that value and at most (100 - n) percent are above that value For example: If a plant operator takes a safety examination and 87.6% of the safety exam scores are below that person’s score, he or she still scores at only the 87th percentile, even though more than 87% of the scores are lower. The median is the 50 th percentile and has the same value as the 50 th percentile Percentiles

16 Percentiles are stair step values: for example, the 87 th and 88 th percentile have no values between them Percentile methods are applicable for ordinal, interval, and ratio data and are not applicable for nominal data In general percentiles are not influenced by extreme values in the data set Percentiles

17 1.Organize the data into ascending order 2.Calculate the percentile location (i) using: 3.Determine the location -If i is a whole number, the Pth percentile is the average of the value at the ith location and the value at the (i + 1)th location. -If i is not a whole number, the Pth percentile value is located at the whole-number part of i + 1. Steps in Determining the Location of the Percentile Where P = percentile i = percentile location n = number in the data set

18 Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 Problem: Find 30th percentile Number of observations n = 8 Location of 30 th Percentile: The location index, i, is not a whole number. Therefore put location at whole number portion of ( i + 1) = = 3.4. The whole number portion is 3. The 30th percentile is at the 3rd location of the array: 30 th percentile = 13 Calculating Percentiles: An Example

19 Quartiles are measures of central tendency that divide a group of data into four subgroups or parts Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile Relationship between Quartiles and percentiles Q1 is equal to the 25th percentile Q2 is located at 50th percentile and equals the median Q3 is equal to the 75th percentile Quartiles Quartile values are not necessarily members of the data set

20 Let X be an ordered array: If X={ 106, 109, 114, 116, 121, 122, 125, 129} then Q1: Q2: Q3: Note that when i is a whole number the quartile is the average of the ith and (i+1)th values in the ordered set Calculating Quartiles: An Example

21 Measures of variability are used to describe the spread or dispersion of data By using variability with measures of central tendency, the result is a more complete description of data Measures of variability for ungrouped data include: – range, – interquartile range, – mean absolute deviation, – variance, – standard deviation, – z scores and – coefficient of variation Measures of Variability: Ungrouped Data

22 Measures of variability describe the dispersion (spread) of a set of data or the convergence (unity) of a set of data Dispersion explains how far data is spread apart or disassociates from the mean Convergence explains how data moves towards union or conformity of the mean Variability is most frequently expressed in terms of deviation from the norm or mean. The images in the next slides express this visually Measures of Variability: Ungrouped Data

23 Variability No Variability in Cash Flow (same amounts) Variability in Cash Flow (different amounts) Mean

24 Variability No Variability Variability

25 The range is the difference between the largest and smallest values in the data set Usefulness: – Advantage - simple to compute – Disadvantages: Ignores all data points except the two extremes Influenced by extreme values Has no reference point Has limited use by itself Example of range using data provided: Range

26 The interquartile range contains all values in the interval between the first and third quartiles The interquartile range accounts for the middle 50% of values in the ordered data set The interquartile range is especially useful in situations where data users are more interested in values toward the middle and less interested in extremes The interquartile range is less influenced by extremes Interquartile Range Interquartile Range = Q 3 – Q 1

27 An examination of deviations from the mean can reveal information about the variability of data However, the individual deviations are used mostly as a tool to compute other measures of variability Example – The following data set includes: 5, 9, 16, 17, 18 with a mean of µ = 13 (x -  ) show distances around the mean or individual deviation from the mean: -8, -4, 3, 4, 5 Deviation from the Mean

28 Absolute deviations express the tendency for observations to differ on the average from the mean Easy to calculate but not as statistically useful or unbiased as the use of variance and standard deviation measures Below is an example calculating the mean absolute deviation Mean Absolute Deviation

29 Population variance is the sum of the square deviations divided by the number of observations Statistics are measured in terms of square units of measurement Square units of measurement are hard to interpret so variance is typically used as a process of obtaining the standard deviation of a data set Population Variance

30 Given the following x values, the solution would be expressed as 26.0 units squared Example of Population Variance

31 Square root of the population variance Easier to interpret in practice than the variance Measures the dispersion of the population data from the mean Population Standard Deviation

32 Sample variances are also expressed as units squared. For example: Example of Sample Variance

33 Example of Sample Standard Deviation The sample standard deviation is the square root of the sample variance Easier to interpret in practice than square units Sample standard deviation is used as a good estimator of the population standard deviation

34 Standard deviation is the square root of the variance Standard deviation of a population is denoted by: The standard deviation of a sample is denoted by: Standard Deviation

35 Indicator of financial risk Quality Control – construction of quality control charts – process capability studies Comparing two or more populations – household incomes in two cities – employee absenteeism at two plants – used as a percentage of the mean, the coefficient of variation (CV) Uses of Standard Deviation

36 Standard Deviation as an Indicator of Financial Risk

37 Data are either symmetric or non-symmetric with respect to some measure of central tendency Statisticians have observed that distributions describing many types of business and economic data tend to be symmetric or have a normal shape They found that in practical terms the processes that generate symmetric data have special and exact properties (the empirical rule) with respect to data concentration Non-symmetric distributions, in practice and theory, obey as a minimum specified rules with respect to the concentration of data values in a population (The Chebyschev Theorem) Symmetric and Asymmetric Distributions

38 When data are normally distributed or approximately normal Empirical Rule

39 The Chebyshev Theorem applies to all distributions It measures the minimum mass or concentration of data that lies within a specified number of standard deviation around the mean - Chebyshev’s Theorem - When Data are Not Normally Distributed or Nonsymmetric.

40 A general theory applying to all distributions Calculations for k= 2,3,4. k = 1 is not defined Chebyshev’s Theorem Number of Standard Deviations k Distance from the Mean Minimum Proportion of Values Falling within Distance from the Mean

41 The z score represents the number of standard deviations a value (x) is above or below the mean Data for a z score is normally distributed Translates into standard deviations Z score formula Z Scores

42 Ratio of the standard deviation to the mean, expressed as a percentage Measurement of relative dispersion expressed as: Coefficient of Variation  CV    100

43 Examples of Coefficient of Variation            CV.           ... CV

44 Measures of Central Tendency – Mean – Median – Mode Measures of Variability – Variance – Standard Deviation Measures of Central Tendency and Variability: Grouped Data

45 Weighted average of class midpoints Class frequencies are the weights Mean of group data: Mean of Grouped Data

46 Example Calculation of Grouped Mean

47 Median of Grouped Data

48 Calculating the Median of Grouped Data

49 The modal class is class interval with the greatest frequency - (7- under 9) for the example below. The mode for the grouped data is the class midpoint of the modal class. Mode = 8 for the example below. Estimating the Mode from Grouped Data

50 Variance and Standard Deviation from Grouped Data

51 Population Variance and Standard Deviation of Grouped Data

52 Skewness – Absence of symmetry – Presence of extreme values in one or other side of a distribution Kurtosis – Peakedness of a distribution – Leptokurtic: high and thin peak – Mesokurtic: normal or mound shaped top – Platykurtic: flat topped and spread out Box and Whisker Plots – Graphic display of a distribution using 5-summary statistics – Reveals skewness and data location or clustering Descriptions and Measures of Shape

53 Probability Distributions Showing Symmetry and Skewness Symmetrical Right or Positively Skewed Left or Negatively Skewed

54 Symmetrical Shape Frequency Histogram Showing Relationship of Mean, Median and Mode

55 A summary measure for skewness based on the relationship of mean to median and the variation in the data If < 0, the distribution is negatively skewed (skewed to the left). If = 0, the distribution is symmetric (not skewed). If > 0, the distribution is positively skewed (skewed to the right). Coefficient of Skewness

56 Effect of Changes in Mean on the Coefficient of Skewness

57 Types of Kurtosis

58 Five specific numbers are used: – Median, Q2 – First quartile, Q1 – Third quartile, Q3 – Minimum value in the data set – Maximum value in the data set Inner Fences: First Indicators of extreme values – IQR = Q3 - Q1 – Lower inner fence = Q IQR – Upper inner fence = Q IQR Outer Fences: Strong Indicators of extreme values – Lower outer fence = Q IQR – Upper outer fence = Q IQR Requirements for A Box and Whisker Plot

59 Box and whisker plot can determine skewness of a distribution. The location of the median in the box can indicate the skewness of the middle 50% of the data. If the median is located on the right side of the box, then the middle 50% are skewed to the left. If the median is on the left side, then the middle 50% are skewed to the right. Researcher can make judgment about skewness based on length of whiskers If the longest whisker is to the right of the box, then the outer data are skewed to the right, and vice versa. See box and whisker plot in next slide Skewness and the Box Plot

60 Box and Whisker Plot

61 COPYRIGHT Copyright © 2014 John Wiley & Sons Canada, Ltd. All rights reserved. Reproduction or translation of this work beyond that permitted by Access Copyright (The Canadian Copyright Licensing Agency) is unlawful. Requests for further information should be addressed to the Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser may make back-up copies for his or her own use only and not for distribution or resale. The author and the publisher assume no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.


Download ppt "Chapter 3: Descriptive Statistics. LO1 Apply various measures of central tendency— including the mean, median, and mode—to a set of ungrouped data. LO2Apply."

Similar presentations


Ads by Google