Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Serhat Eren 1 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you.

Similar presentations


Presentation on theme: "Dr. Serhat Eren 1 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you."— Presentation transcript:

1

2 Dr. Serhat Eren 1 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you. When you are trying to solve a problem by analyzing data, this is the best situation to be in. You have what is known as raw or ungrouped data. Individual observations are known as raw or grouped data.Sometimes you do not have access to the individual observations. This may occur for confidentiality reasons or sometimes you have not collected the data yourself.

3 Dr. Serhat Eren 2 if you are using secondary data, and much of the data published on the Web are unavailable as raw data. Thus, often the only thing available to you is what is known as grouped data. For example, suppose you wished to compare the salaries of managers in your organization to national values. The human resource manager may not wish to share individual salary values with you but might give you information in the following form: 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

4 Dr. Serhat Eren 3 Grouped data are data that are available only as a frequency distribution. The individual observations are not accessible. TimeFrequency 0 < x  $30,000 30,000 < x  60,000 60,000 < x  90,000 183183 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

5 Dr. Serhat Eren 4 6.5.1 Measures of the Center for Grouped Data There are three measures of the center: the mean, the median, and the mode. First consider how to estimate the mean of the data set when you have grouped data. For example, consider the amount of time, in minutes, people occupy a table in a particular restaurant. The manager is interested in the center or the typical length of time that the table is occupied. She has only the following frequency table from 32 observations: 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

6 Dr. Serhat Eren 5 Remember that to calculate the mean you sum all the data and divide by the sample size. TimeFrequency 25.0 < x  35.0 35.0 < x  45.0 45.0 < x  55.0 55.0 < x  65.0 65.0 < x  75.0 75.0 < x  85.0 85.0 < x  95.0 5 2 4 3 11 3 4 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

7 Dr. Serhat Eren 6 But for grouped data you cannot sum the actual data because you don't have them. So, you have to estimate what the values might sum to for each interval. Consider the 5 observations that fall in the first interval between 25 and 35 minutes. We need a way to estimate the sum of those 5 values to begin our estimation of the mean. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

8 Dr. Serhat Eren 7 It seems reasonable to use the middle of the interval as our best "guess" of the actual values m the class. So, you must first find the midpoint of each class. In this dataset, the 5 values for table times that fall between 25 and 35 min are assumed to be spread evenly throughout the interval so that the middle value of 30 minutes is a good representation of the data in that interval. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

9 Dr. Serhat Eren 8 Since there are 5 of them, you multiply the midpoint of 30 by the frequency of 5 to get the contribution to the sum for that interval. This is like adding 5 values of 30 together. This process is repeated for each interval and then the sums are added together and divided by the sample size. The details are shown in the next example. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

10 Dr. Serhat Eren 9

11 10 This procedure is summarized in the steps below. It gives you a good estimate of the mean when the data are in fact evenly spread out throughout the interval. –Step 1. Find the midpoint of each class. Call it m j. –Step 2. Multiply the midpoint by the class frequency, f j, to yield f j m j. –Step 3. Add up all the interval sums found in step 2. –Step 4. Divide the sum from step 3 by the sample size, n. Note that the sample size is the sum of all the frequencies. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

12 Dr. Serhat Eren 11 The formula for estimating the mean from grouped data is thus 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

13 Dr. Serhat Eren 12

14 Dr. Serhat Eren 13 Recall that the median is the data value of the middle observation in an ordered set of data; thus it is the value at or below which half (50%) of the data values fall. So to find the median for grouped data we need to find the midpoint of the interval that contains the data value whose cumulative relative frequency is 0.50. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

15 Dr. Serhat Eren 14 6.5.1 Measures of the Center for Grouped Data Recall that the mode is the data value that has the highest frequency of occurrence in the sample. Using this definition, it is easy to see that the modal class is the class interval in the frequency distribution that has the highest frequency. The estimate of the mode is then the midpoint of the modal class. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

16 Dr. Serhat Eren 15

17 Dr. Serhat Eren 16 6.5.2 Measures of Dispersion for Grouped Data Clearly with grouped data the sample range can be estimated by taking the difference between the upper value of the last class and the lower value of the first class. In order to adapt the formula for the sample variance for use with grouped data, we need to take the same approach that we used for estimating the sample mean for grouped data. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

18 Dr. Serhat Eren 17 In particular, we need to adapt the formula for the sample variance shown below to accommodate the fact that we no longer have the individual data values represented by x i in the formula 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

19 Dr. Serhat Eren 18 The following formula and steps for estimating the sample variance for grouped data. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

20 Dr. Serhat Eren 19 Step 1. Find the midpoint of each class. Call it m j. Step 2. Subtract the estimate of the sample mean, from each class midpoint. Square the difference. Step 3. Multiply the result of step 2 by the class frequency. Step 4. Add up the results of step 3 for all classes. Step 5. Divide the sum from step 4 by one less than the sample size, n - 1. 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA

21 Dr. Serhat Eren 20 6.6 MEASURES OF RELATIVE STANDING 6.6.1 Percentiles It is useful in some real situations to know what data value in a sample has a certain percentage of the sample above or below it. This measure is known as the percentile of the data. The p th percentile of a data set is the value that has p% of the data at or below it.

22 Dr. Serhat Eren 21 Two questions can be asked involving percentiles: –What value has p% of the data at or below it? –What is the percentile rank of a particular data value? The first question involves finding either a particular percentile or set of percentiles, such as the deciles (10%, 20%,...,90%). The second question involves finding the percentile rank of a particular value in a data set. 6.6 MEASURES OF RELATIVE STANDING

23 Dr. Serhat Eren 22

24 Dr. Serhat Eren 23

25 Dr. Serhat Eren 24 The percentile rank of a value is the percentage of the data in the sample that are at or below the value of interest. This measure allows you to determine the relative standing of an observation in a set of data. To find the percentile rank of an observation, the data must be put in numerical order. 6.6 MEASURES OF RELATIVE STANDING

26 Dr. Serhat Eren 25

27 Dr. Serhat Eren 26 The percentile rank, P, is then found by b= the number of data values below the value of interest e= the number of data values equal to the value of interest n= the sample size 6.6 MEASURES OF RELATIVE STANDING

28 Dr. Serhat Eren 27 6.6.2 Quartiles There are certain percentiles that are used frequently. These percentiles are the 25 th percentile and the 75 th percentile, also known as the first and third quartiles. The first quartile, Q l,is the value in the sample that has 25% of the data at or below it. The third quartile, Q 3, is the value in the sample that has 75% of the data 6.6 MEASURES OF RELATIVE STANDING

29 Dr. Serhat Eren 28 Since percentiles and quartiles are order statistics, finding them requires that the data set be sorted from lowest to highest, –Step 1: Put the data set in order and find the median of the data. –Step 2: Take the lower half of the data and find the median of the lower half of the data. This value will be the first quartile, Q 1. –Step 3: Take the upper half of the data and find the median of the upper half of the data. This value will be the third quartile, Q 3. 6.6 MEASURES OF RELATIVE STANDING

30 Dr. Serhat Eren 29

31 Dr. Serhat Eren 30

32 Dr. Serhat Eren 31 6.6.3 Displaying the Data Using Box-plots A box-plot or box and whisker diagram is a graphical display that uses summary statistics to display the distribution of a set of data. A box-plot summarizes a sample using the quartiles and the median. If you look at the first and third quartiles of a sample, Q 1 and Q 3, you see that 50% of the data in the sample fall between these two values. The distance between these two values is called the interquartile range (IQR). 6.6 MEASURES OF RELATIVE STANDING

33 Dr. Serhat Eren 32 The interquartile range (IQR) is the difference between the third and first quartiles Q3 - Q1. Figure 6.6 provides a partial picture of the data set. To complete the description with the empirical rule we used two additional intervals,  ±2  and  ±3 . 6.6 MEASURES OF RELATIVE STANDING

34 Dr. Serhat Eren 33 6.6.4 Using a Box-plot to Identify Outliers Sample data that fall between the inner and outer fences are called possible outliers, while data values that fall beyond the outer fences are called probable outliers. If you are having trouble figuring out the difference between probable and possible, think about the difference in your reaction when I tell you, "It is possible that you will pass this course" vs. "It is probable that you will pass this course." 6.6 MEASURES OF RELATIVE STANDING

35 Dr. Serhat Eren 34

36 Dr. Serhat Eren 35


Download ppt "Dr. Serhat Eren 1 6.5 DESCRIPTIVE STATISTICS FOR GROUPED DATA If there were 30 observations of weekly sales then you had all 30 numbers available to you."

Similar presentations


Ads by Google