Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics-IV (Measures of Variation)

Similar presentations


Presentation on theme: "Descriptive Statistics-IV (Measures of Variation)"— Presentation transcript:

1 Descriptive Statistics-IV (Measures of Variation)
QSCI 381 – Lecture 6 (Larson and Farber, Sect 2.4)

2 Deviation, Variance and Standard Deviation-I
The of a data entry xi in a population data set is the difference between xi and population mean , i.e. The sum of the deviations over all entries is zero. The is the sum of the squared deviations over all entries:  is the Greek letter sigma. Deviation Population variance

3 Deviation, Variance and Standard Deviation-II
The is the square root of the population variance, i.e.: Note: these quantities relate to the population and not a sample from the population. Note: sometimes the standard deviation is referred to as the standard error. Population standard deviation

4 The Sample variance and Standard Deviation
The and the of a data set with n entries are given by: Sample variance Sample standard deviation Note the division by n -1 rather than N or n.

5 Calculating Standard Deviations
Step Population Sample Find the mean Find the deviation for each entry Square each deviation Add to get the sum of squares (SSx) Divide by N or (n -1) to get the variance Take the square root to get the standard deviation

6 Example Find the standard deviation of the following bowhead lengths (in m): (8.5, 8.4, 13.8, 9.3, 9.7) Key question (before doing anything) – is this a sample or a population?

7 Formulae in EXCEL Calculating Means: Average(“A1:A10”)
Calculating Standard deviations: Stdev(“A1:A10”) – this calculates the sample and not the population standard deviation!

8 Standard Deviations-I
SD=0 SD=2.1 SD=5.3

9 Standard Deviations-II (Symmetric Bell-shaped distributions)
k = 2: proportion > 75% k = 3: proportion > 88% Chebychev’s Theorem: The proportion of the data lying within k standard deviations (k >1) of the mean is at least 1 - 1/k2 68% 34% 95% 13.5% 99.7%

10 Standard Deviations-III (Grouped data)
The standard deviation of a frequency distribution is: Note: where the frequency distribution consists of bins that are ranges, xi should be the midpoint of bin i (be careful of the first and last bins).

11 Standard Deviations-IV (The shortcut formula)

12 The Coefficient of Variation
The is the standard deviation divided by the mean - often expressed as a percentage. The coefficient of variation is dimensionless and can be used to compare among data sets based on different units. coefficient of variation

13 Z-Scores The is calculated using the equation: Standard (or Z) score

14 Outliers-I Outliers can lead to mis-interpretation of results. They can arise because of data errors (typing measurements in cm rather than in m) or because of unusual events. There are several rules for identifying outliers: Outliers: < Q2-6(Q2-Q1); > Q2+6(Q3-Q2) Strays: < Q2-3(Q2-Q1); > Q2+3(Q3-Q2)

15 Outliers-II Strays and outliers should be indicated on box and whisker plots: Consider the data set of bowhead lengths, except that a length of 1 is added! 5 10 15 Length (m)

16 Review of Symbols in this Lecture

17 Summary We use descriptive statistics to “get a feel for the data” (also called “exploratory data analysis”). In general, we are using statistics from the sample to learn something about the population.


Download ppt "Descriptive Statistics-IV (Measures of Variation)"

Similar presentations


Ads by Google