1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke
2 Objectives Define measures of central tendency, variability, shape and association Define statistical measures Compute statistical measures for ungrouped and grouped data Interpret statistical results
3 Introduction In most competitive sports, one looks for the position of the athletes, e.g. who came in first, second, and so on. In statistics, one is interested in the following measures: - most frequent value in data set - summary of all values in data set - midpoint position of data set - positions of data in data set - distances to midpoint of data set
4 Exercise: Statistical Measure 1 We want to find out which of the following students is the better one using the available data. The data shows the positions of the two competitors in several rounds of testing. Kuli1 st 2 nd 1 st 2 nd 1 st 4 th 3 rd 3 rd 2 nd 5 th 1 st Marti 3 rd 2 nd 3 rd 1 st 2 nd 1 st 1 st 1 st 3 rd 2 nd 3 rd
5 Response: Commonsense Approach We want to find out which of the following students is the better one using the available data. Kuli 1 st 2 nd 1 st 2 nd 1 st 4 th 3 rd 3 rd 2 nd 5 th 1 st Marti 3 rd 2 nd 3 rd 2 nd 2 nd 1 st 1 st 1 st 3 rd 2 nd 1 st 3 times Kuli was 1 st Marti was behind 3 times Marti was 1 st Kuli was behind Marti had more 2 nd places Marti had more 3 rd places Imagine that you had a data set with 500 values!!
6 Mode The most frequently occurring value in a data set Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) Bimodal -- Data sets that have two modes Multimodal -- Data sets that contain more than two modes
7 Median Middle value in an ordered array of numbers Applicable for ordinal, interval, and ratio data Not applicable for nominal data Unaffected by extremely large and extremely small values
8 Median: Computational Procedure First Procedure Arrange the observations in an ordered array. If there is an odd number of terms, the median is the middle term of the ordered array. If there is an even number of terms, the median is the average of the middle two terms. Second Procedure The median’s position in an ordered array is given by (n+1)/2.
9 Median: Odd Number Example (Long method) Ordered Array There are 17 terms in the ordered array. Position of median = (n+1)/2 = (17+1)/2 = 9 The median is the 9th term, which is 15. If the 22 is replaced by 100, the median is 15. If the 3 is replaced by -103, the median is 15.
10 Median: Even Number Example (Long Method) Ordered Array There are 16 terms in the ordered array. Position of median = (n+1)/2 = (16+1)/2 = 8.5 The median is between the 8th and 9th terms, NOTE If the 21 is replaced by 100, the median is If the 3 is replaced by -88, the median is 14.5.
11 Arithmetic Mean Commonly called ‘the mean’ Is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values Computed by summing all values in the data set and dividing the sum by the number of values in the data set
12 Population Mean (Long method) Data for total population: 57, 57, 86, 86, 42, 42, 43, 56, 57, 42, 42, 43
13 Computing Sample Mean (Long method) Population mean is not the same thing as sample mean! Our numbers (57, 86, 42) is as sample that is drawn from the population and hence it is a small segment of it.
14 Computing Central Tend. Measures using Frequency Tables (Compact method) Mean= F i *X i F i = 1655/15 = XiXi FiFi F i * X i Mode= 125 Median position = = (15+1)/2 = 8th Median value = 125 THIS IS THE TYPE APPROACH YOU NEED TO MASTER FOR YOUR EXAM. Data for total population: 55, 55, 60, 100, 100, 100, 125, 125, 125, 125, 125, 140, 140, 140, 140
15 Exercise: Computing Central Tend. Measures using Frequency Tables Mean= F i *X i F i= XiXi FiFi F i * X i n=14 Mode= Median position = = Median value =
16 Response: Computing Central Tend. Measures using Frequency Tables Mean= F i *X i F i = 82/14 =5.85 XiXi FiFi F i * X i n=1482 Mode= 6 and 4 Median position = = (14+1)/2 = 7.5 (between 7 th and 8 th ) Median value = = (6+6)/2 = 6
17 Opening Exercise: Using Statistical Measures Kuli 1 st 2 nd 1 st 2 nd 1 st 4 th 3 rd 3 rd 2 nd 5 th 1 st Marti 3 rd 2 nd 3 rd 2 nd 2 nd 1 st 1 st 1 st 3 rd 2 nd 1 st Mode: Most frequently occurring value of variable Mode for Kuli: 1 st Mode for Marti: 1 st Mean: Average of the values of a variable Sample mean = X i n Mean or average score for Kuli 25/11 = 2.27 Mean or average score for Marti 21/11 = 1.9
18 Using Statistical Measures Kuli 1 st 2 nd 1 st 2 nd 1 st 4 th 3 rd 3 rd 2 nd 5 th 1 st Marti 3 rd 2 nd 3 rd 2 nd 2 nd 1 st 1 st 1 st 3 rd 2 nd 1 st Median: The value in the middle of an ordered data set of n values. Median point = (n + 1)/2 = (11+ 1)/2 = 6th position Kuli 1 st 1 st 1 st 1 st 2 nd 2 nd 2 nd 3 rd 3 rd 4 th 5 th Marti 1 st 1 st 1 st 1 st 2 nd 2 nd 2 nd 2 nd 3 rd 3 rd 3 rd Median score for Kuli is 2 nd Median score for Marti is 2 nd Notice median requires ordered set
19 Using Frequency Distribution Tables Analysis of Kuli’s performance Mean = F i * X i F i = 25/11 = 2.27 Mode = 1 st Median point = (11+ 1)/2 = 6 th Median value = 2 nd Using cumul. Freq. column = 2 nd XiXi Frequency (F i ) F i * X i Cum. (C F i ) 1 st nd rd th th 1511 25
20 Using Frequency Distribution Tables Analysis of Marti’s performance Mean = F i * X i F i = 21/11 = 1.9 Mode = 1 st & 2 nd Median point = (11+ 1)/2 = 6 th Median value = 2 nd Using cumul. Freq. column = 2 nd XiXi Frequency (F i ) F i * X i Cum. (C F i ) 1 st nd rd th th 000 1121
Using Frequency Distribution Tables Who is the better student? XiXi MartiKuli Mean Median value2 nd Mode1 st & 2 nd 1 st
22 New Case: Median measure Analysis of Katie’s performance Mean = F i * X i F i = 31/12 = 2.58 Mode = 3 rd Median point = (12+ 1)/2 = 6.5 th > median value is between 6th and 7th positions Median value =(2 nd +3 rd )/2 = 2.5 th > Average of the 6 th and 7 th positions. XiXi Frequency (F i ) F i * X i Cum. (C F i ) 1 st nd rd th 1412 31
23 Examples
24 Percentiles Sometimes we are not analyzing several values from one person, but one value for several persons or objects. For example we have data from the performance of several fund managers for year We want to present the data in the form, XX manager is in the top 10 or tenth percentile or top 25 or 25 th percentile. The method used consists of three steps - organize data in ascending order - calculate location of percentile you want - identify the object in the percentile location from the data set
25 Interpretation: Percentiles If manager YY is in the tenth percentile of of a group, this means that at least 10% of everyone scored below manager YY and at most 90 % of everyone in the data set scored better than manager YY. If manager Pico is in the 95 th percentile of of a group, this means that at least 95 % of everyone in the data set scored below manager Pico and at most 5 % of everyone in the data set scored better than the manager.
26 Exercise: Percentiles for Known Values First name Fund performance Bill106% Jane109% Sven114% Larry116% Dub121% Anna122% Cole125% Salome129% In which percentile is Sven?
27 Deriving Percentiles with Cumulative Relative Frequency Approach for Observed Values First name Fund performance Bill106% Jane109% Sven114% Larry116% Dub121% Anna122% Cole125% Salome129% In which percentile is Sven? Fi Rel. fi 11/ N=8 Cum rel. fi Percentiles 1/8= th Percentile 2/8= th Percentile 3/8= th Percentile 4/8= th Percentile 5/8= th Percentile 6/8= th Percentile 7/8= th Percentile 8/8=1 100 th Percentile
28 Deriving Percentiles with Cumulative Relative Frequency Approach for Unobserved Values First name Fund performance Bill106% Jane109% Sven114% Larry116% Dub121% Anna122% Cole125% Salome129% What is the value of the 90 th percentile? Fi Rel. fi 11/ N=8 Cum fi Percentiles 1/ th Percentile 2/8 25 th Percentile 3/ th Percentile 4/8 50 th Percentile 5/ th Percentile 6/8 75 th Percentile 7/ th Percentile th Percentile
29 Computing Data Values When Given Percentile locations (Approximate method) 90 th percentile location i = (P/100) * N = 0.9 * 8 = 7.2 th position Result is not an integer, percentile position is ( ) rounded up to 8 th position. 90 th percentile value from tables = 129% This is an approximate method because the formula gives the same result for multiple percentiles: The approximate method gives the same result of 129% for 91 st, 92 nd, 93 rd, up to 100 th percentiles 50 th percentile location i = (P/100) * N = 0.5 * 8 = 4 th position 50 th percentile = (4 th value + 5 th value)/2 = ( )/2 = 118.5% (But from tables we see that 116% is also the 50 th percentile) RECOMMENDATION: USE THIS APPROXIMATE APPROACH FORMULA WHEN YOU ARE DEALING WITH UNOBSERVED VALUES. IF YOU USE THE APPROACH IN THE EXAM, YOU WILL NOT BE MARKED WRONG.
30 Computing Percentile locations with arithmetic formula (More precise method) 90 th percentile location i = (P/100) * N = 0.9 * 8 = 7.2 th position 90 th percentile is 0.2 or 20% between the 7 th and 8 th The value for the 90 th percentile is computed by computing the following values = 7 th position’s value + (8 th position’s value - 7 th position value)* Fraction got from computing i 125% + (129% - 125%)*0.2 = 125.8% (~ 126%) 50 th percentile location i = (P/100) * N = 0.5 * 8 = 4 th position 50 th percentile = 116%
31 Overview Measures and Summary of Conditions for Using Descriptive Measures The use of statistical measures is conditioned on the level of measurement of data. For specific levels, e.g. nominal level, many statistical measures can not be used.
32 Descriptive Measures for Grouped Data Mean, Median and Mode can all be computed for quantitative data sets, that were measured at the right level.
33 Class intervalFrequency (F i ) Midpoints (M i ) [1 – 3) inches162 [3 – 5) inches24 [5 – 7) inches46 [7 – 9) inches38 [9 – 11) inches910 [11 – 13) inches612 40 Exercise: Central Tendency Measures for Grouped Data Modal class: Median position: Median class:
34 Class intervalFrequency (F i ) Midpoint (M i ) [1 – 3) inches162 [3 – 5) inches24 [5 – 7) inches46 [7 – 9) inches38 [9 – 11) inches910 [11 – 13) inches612 40 Response: Central Tendency Measures for Grouped Data Modal class: [1 – 3) inches Median position: (n+1)/2 = 41/2 =20.5 between 20 th and 21 st positions Median class:[5-7) inches (this would be hard to derive if it were between 18th and 19 th positions, i.e. it crossed two classes)
35 Class intervalFrequency (F i ) Midpoint (M i ) (F i )*(M i ) [1 – 3) inches16232 [3 – 5) inches248 [5 – 7) inches4624 [7 – 9) inches3824 [9 – 11) inches91090 [11 – 13) inches61272 Example: Central Tendency Measures for Grouped Data Find the mean for the distribution: Mean: = (Σ F i *M i )/n = 226/40 = 5.65 inches
36 Class intervalFrequency (F i ) Midpoint (M i ) (F i )*(M i ) [1 – 2) inches2 [2 – 3) inches2 [3 – 4) inches4 [4 – 5) inches2 [5 – 6) inches1 Exercise: Central Tendency Measures for Grouped Data Find the mean for the distribution: Mean: = (Σ F i *M i )/n = inches
37 Class intervalFrequency (F i ) Midpoint (M i ) (F i )*(M i ) [1 – 2) inches20.51 [2 – 3) inches22.55 [3 – 4) inches [4 – 5) inches24.59 [5 – 6) inches15.5 Response: Central Tendency Measures for Grouped Data Find the mean for the distribution: Mean: = (Σ F i *M i )/n = 34.5/11 = inches
38 Excel Examples