Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.

Similar presentations


Presentation on theme: "What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data."— Presentation transcript:

1 What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data. The common measures of variation in data are – range, deviation, variance and standard deviation. 2.4 Measures of Variation

2 Range The range is the simplest measure of variation. It is difference between the biggest and smallest random variable. Range = Maximum value - Minimum value Range has the advantage of being easy to compute. Its disadvantage, however, is that it uses only two entries from the entire data set. Age based on class survey data: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Range = maximum – minimum = 48 – 16 = 32

3 Deviation, Variance and Standard Deviation The deviation of an entry x i in a data set is the difference between that entry and the mean μ of the data set i.e. x i – μ The population variance of the population data set of N entries is: The population standard deviation is the square root of the population variance i.e. The sample variance of the sample data set of N entries is: The sample standard deviation is the square root of the sample variance i.e.

4 Deviation, Variance and Standard Deviation Age based on class survey: 26, 25, 35, 35, 40, 41, 21, 19, 20, 20, 30, 25, 24, 47, 36, 16, 23, 48, 40, 21, 27, 22, 39, 34, 26, 25, 16, 24, 33, 32, 28, 48, 40, 38. Population size N = 34, Population mean μ = 1024/34 = 30.11765 σ 2 = 82.2803 σ = 9.0708 Age (x i )x i - μ(x i – μ) 2 26 -4.117616.9550 25 -5.117626.1903 : :: : :: 38 7.882362.1314 Σ=2797.5294

5 Deviation, Variance and Standard Deviation Variance and standard deviation take into consideration all the data. However they are both easily influenced by extreme scores since it is a square term. Variance is hard to interpret since it is a squared measure, standard deviation is interpreted as the average deviation from the mean.

6 Interpreting Standard Deviation When interpreting the standard deviation, remember that it is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.

7 Interpreting Standard Deviation Empirical Rule or The 68-95-99.7 rule: For a bell shaped symmetric distribution 68% of the data lies within one standard deviation of the mean, 95% of the data lies within two standard deviations of the mean and 99.7% of the data lies within 3 standard deviations of the mean.

8 Interpreting Standard Deviation Chebychev’s theorem When the distribution is not bell shaped or symmetric then this theorem gives a lower bound to the proportion of data the lies with k standard deviations of the mean. It states that: The proportion of any data set lying within k standard deviations of the mean is at least k=2, In any data set, at least i.e. 75% of the data lies within 2 standard deviations of the mean.

9 Standard Deviation of Grouped Data Sample standard deviation for a frequency distribution is: Where c is the number of classes, x i is the ith data point in the sample, f i is the corresponding frequency, n is the sample size.

10 What are measures of position? A measure of position gives you some idea of where particular data values would rank in an ordering of a data set where a data value falls with respect to the mean of the sample or population.. 2.5 Measures of Position

11 Quartiles Quartiles divide the data into 4 equal parts. We need three quartiles to divide any data set into 4 equal parts, Q 1, Q 2 and Q 3. About a quarter of the data falls below the first quartile, Q 1 About a half of the data falls below the second quartile, Q 2 About three quarters of the data falls below the third quartile, Q 3 Interquartile range (IQR) of a data set is the difference between the third and first quartiles, Q 3 – Q 1

12 Quartiles In essence five values can use used to describe a data set: Minimum data value, three quartiles - Q 1, Q 2, Q 3 and maximum data value. These five numbers are called the five number summary since they describe the central tendency, the spread and the variation in the data. Drawing a Box-whisker plot Find the five-number summary of the data set. Construct a horizontal; scale that spans the range of the data. Plot the five number above the horizontal scale. Draw a box above the horizontal scale from Q 1 to Q 3 and draw a vertical line in the box at Q 2. Draw whiskers from the box to minimum and maximum entries For the age data: Min = 16, Q1=23.25, Q2 = 27.5, Q3 = 37.5, Max = 48 Min entry Q1 Q2, Median Q3 Max entry Whisker Box Whisker

13 Percentiles and Other Fractiles FractilesSummarySymbols QuartilesDivide a data set into 4 equal parts Q 1, Q 2, Q 3 DecilesDivide a data set into 10 equal parts D 1, D 2, D 3,.. Q 9 PercentilesDivide a data set into 100 equal parts P 1, P 2, P 3,.. P 99 Fractiles are numbers that divide an ordered data set into equal parts. Some commonly used fractiles are:

14 z-score The standard score or z-score, represents the number of standard deviations a given value x falls from the mean μ. To find the z-score for a given value, A z-score can be positive, negative or zero. If z is positive, the data point > the mean, If z is negative, the data point < the mean, If z = 0, the data point = mean.


Download ppt "What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data."

Similar presentations


Ads by Google