Variability: Standard Deviation

Variability: Standard Deviation
Lecture 4 Variability: Standard Deviation

Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions vary? Or the Interquartile range? Measure of error - is our sample similar to the population OR is an individual score representative of its sample

Standard Deviation Standard deviation - the average distance on either side of the mean. Goal of the SD is to measure the standard or typical distance from the mean. But it’s not practical with large N, so we need to estimate the variance and standard deviation using equations Mean = 70.8 Ben is 66 in. tall. His deviation from the mean is -4.8. James is 75 in. tall. His deviation from the mean is 4.2

Standard Deviation How much scores typically vary around the mean; a measure of dispersion Usually 1/5 - 1/6 of the range Based on the mean, therefore: Requires at least interval data Sensitive to outliers accounts for all scores in a distribution M f 1 2 3 4 5 6 7 9 8

Logic of the Standard Deviation: Let’s start by looking at the population
Step 1: Find the Deviation for each score from the mean. X - . Be sure to include both the sign (+/-) and the number. X -  X * Notice that the sum of the deviations = 0. This reflects the fact that the mean is a balancing point * Bonus - you can use this fact to check yourselves 65 -14 90 +11 84 +5 Keep the goal in mind…standard or typical distance from the mean 76 -3 81 +2 98 +19 82 +3 56 -23  = 79

Square of each score and sum them = Sum of Squared Deviations = SS
Step 2 - Remember the standard deviation is the average of the deviations, but this won’t work because the sum of our deviations = 0 Solution = get rid of the signs (+/-) Square each score Square of each score and sum them = Sum of Squared Deviations = SS X -  (X – )2 X 65 -14.4 207.4 90 10.6 112.4 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 346.0 82 2.6 6.8 59 -20.4 416.2 * Sum of Squared Deviations = SS X = 79.4 1123.9

Step 3 - Calculate the mean squared deviation = SS / N
This value is called the variance and is represented with the symbol MS or 2 . Variance will be important for use in inferential stats methods, but it isn’t the best descriptive stat. -- it’s hard to visualize variability with the variance alone. X -  (X – )2 X 65 -14.4 207.4 90 10.6 112.4 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 MS = / 8 = 140.5 98 18.6 346.0 82 2.6 6.8 59 -20.4 416.2 X = 79.4 1123.9 * Sum of Squared Deviations = SS

Step 4: Correct for having squared all the deviations because we want a value that easily corresponds to the mean that we can visualize: Standard deviation = variance X -  (X – )2 X 65 -14.4 90 10.6 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 82 6.8 59 -20.4 X = 79.4 140.5 = 11.9 207.4 Standard deviation = the square root of the mean squared deviation Conceptually the average distance from the mean: on average a random point pulled from this distribution will be 11.9 away from the mean. 112.4 346.0 416.2 1123.9

Putting it Together  = 11.9 What can we say about a score that lies 12 points from the mean, points? X -  (X – )2 X 65 -14.4 90 10.6 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 82 6.8 59 -20.4 X = 79.4 207.4 112.4 What about a score that lies 30 points from the mean, 49 points? 346.0 416.2 1123.9

Population Standard Deviation
REVIEW: variance = mean squared deviation = greek lower case letter sigma 2 = SS / N Standard deviation =  = SS/ N Computing SS: Definitional formula: SS = (X - )2 Shows exactly how scores vary about the mean (like we just did). Works best on whole numbers. Computational formula: SS = X2 - [ (X)2 / N] Easier for calculations because it works directly with the scores, but less intuitive about the mean.

Formulas for Pop. SD and Variance
Variance = SS / N (mean squared deviation) Standard deviation = SS/N Denoted by Greek letters  and 2

Let’s Do It Together X X -  (X - )2 X2 (X)2 2  24 28 32 33 48 64
42 38 67 55 455 -17.4 -13.4 -9.4 -8.4 6.6 22.6 0.6 -3.4 25.6 13.6 302.8 179.6 88.4 70.6 43.6 510.8 .36 11.6 655.4 185 2351 576 784 1024 1089 2304 4096 1764 1444 4489 3025 21171 207025 213.7 14.6 Definitional:SS = (X - )2 Computational: SS = X2 - [ (X)2 / N]

Another Example… Find  for the following sets of numbers
X X (X)   10 15 17 21 24 31 13 Definitional:SS = (X - )2 Computational: SS = X2 - [ (X)2 / N]

Samples vs. Populations
Rationale: Inferential statistics rely on samples to draw general conclusions about the population. PROBLEM - sample variability tends to be less than population variability. Thus, this variability is biased. That is, it underestimates the pop. variability. pop. variability x x x x x x sample variability

Terms Biased - a sample statistic is said to be biased if on the average the sample statistic consistently underestimates or overestimates the population parameter. Unbiased - a sample statistic is said to be unbiased if on average the sample statistics is equal to the population parameter

An Analogy for a Biased Stat
Imagine you were interested in studying learning in elementary school children. What if you chose as your sample child geniuses from computer and science camp? Could you generalize from your sample to the population of elementary school children? A sample statistic for SD will be biased even with a representative sample - We have to perform a correction

Samples: s and  Changes in notation to reflect a sample:
So to calculate SS (same as for pop.): (1) Find deviation: X - M (2) Squared each deviation: (X - M)2 (3) Sum squared devations: SS = (X - M)2 Correcting for the bias is done in the calculation for the mean square deviation or variance: Sample variance - s2 = SS / (n - 1) Sample standard deviation = s = SS / (n - 1) or s = s2

Let’s Do it Together X X2 f
1 2 3 4 5 6 7 9 8 X 4 5 6 7 8 9 98 16 25 36 49 64 81 714 The smallest distance from the mean is 1 and the largest distance is 3, so the SD should be somewhere in between. SS = (982 / 14) = 28 * NOTE: do not correct for bias in SS S2 or MS = SS / (n-1) S2 or MS = 28 / 13 = 2.2 S = 2.2 = 1.5 SS = X2 - [ (X)2 / n]

Start Easy: Find s X = 5, 1, 5, 5 X = 1, 7, 1, 1 SS = X2 - [ (X)2 / n] NOTE: do not correct for bias in SS S2 or MS = SS / (n-1) S = S2

A little more complex SS = 1698474.01 - (26005920.2 / 16)
SS = X2 - [ (X)2 / n] MS or S2 = SS / n-1 s = SS / (n - 1) SS = ( / 16) MS = / 15 s = 69.8 5099.6

Sample Variability and Degrees of Freedom: Why do we correct with n-1?
(1) the deviations computed from a sample are not “real” deviations. Sampling error - sample and pop. are close, but not exact. SS is smaller for the sample - math. proof Using a sample mean places a restriction on the variability X X -  (X - )2 X X - M (X - M)2 12 8 10 +4 +2 16 4 SS = 17 Where  = 8 SS = 12 Where M = 10

More about n -1 Sample mean is known before deviations and SS can be computed. Sample of n=3 with a M=10. Therefore, as soon as the first two values are given X = 12, 8 you know the last value is 10. n-1 scores can vary; the last score is not free to vary X X -  (X - )2 X X - M (X - M)2 12 8 10 +4 +2 16 4 SS = 17 Where  = 8 SS = 12 Where M = 10

Degrees of Freedom df commonly encountered as n - 1, where n is the number of scores in the sample Refers to the number of scores in a distribution that are free to vary once the M & n are set Example {5, 10, 15}; n = 3; M = 10 How many scores could you change and still have n = 3 & M = 10? n = 1 or 2 So, s2 = SS / n-1 = SS / df

Cafeteria degrees of freedom: An analogy
You are 4th in line at the cafeteria to choose your dessert. The choices are a cheesecake, a piece of fruit, pumpkin pie, and a stale cookie. The first person chooses the cheescake Next to go is the apple Then the pumpkin pie The last choice is restricted and can’t vary. You are stuck with the stale cookie

Degrees of Freedom Why n - 1?
Because you are estimating the  from M. Once this is done, the estimate is fixed & cannot be changed. Therefore, you can only vary N - 1 scores with this fixed value This is the case whenever we are estimating a parameter from a statistic.

A little more about biased stats
Population N=6 (0, 0, 3, 3, 9, 9)  = 4, 2 =14 Take all possible n = 2 samples

Properties of the Standard Deviation
Distribution: Homogeneous sample: data values are very similar = small s2 and s. Heterogeneous sample: data values are dissimilar = big s2 and s. Helps make predictions about the amount of error in your sample. How close is your sample to the population

Transforming scores: Adding or subtracting a constant does not change the SD f 1 2 3 4 5 6 7 9 8 13 11 Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant e.g. you and a friend compare scores on an exam your friend earned a 85 and you earned a 90. Later you find out that a 5 point curve was added to everyone’s score.

Transforming scores: Multiplying or dividing by a constant changes SD by that amount f f 1 10 Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

Factors that affect Variability
Extreme Scores: Range is most affected SD and variance somewhat affected SIR not affected Sample Size: Range is directly related to sample size. This is unacceptable. SD, variance, and SIR unaffected by sample size Open-ended Distributions: Cannot computer range, SD, or variance SIR is your only option

Relationship with other Statistics
SD is derived using information about the mean (distances) - the two go hand-in-hand Interquartile range (& SIR) are based on percentiles, so is the median (mdn is 50th percentile) Range has no direct relationship with any other statistical measures

Why we need to know this information
Variability influences how easy it is to see patterns in our data…. Estimate M for each sample Sample 1 Sample 2 X 34 35 36 X 26 10 64 40

Why we need to know this information
Keep the goal in mind: Research uses samples to deduce information about the population Consider the data from two experiments and determine whether or not there appears to be a consistent difference f f Experiment 1 Experiment 2 Talk therapy = M = 20 Meditation = M = 40

Graphical Representation of 
=1.58 f 1 2 3 4 5 6 7 9 8

Graphic Representation - Box Plots
Also called box-and-whisker plots Useful for comparing distributions displaying variability Box defines the interquartile range Top line defines the third quartile Bottom line defines the first quartile Whiskers extend out to the highest and lowest scores Median is often displayed by a line

Graphic Representation - Boxplots

Pearson’s Coefficient of Skew
Pearson’s coefficient of skew tells us if a distribution is positive or negatively skewed and how much (+/- 0.5 is approximately symmetric/normal) s3 = [3(M - mdn)] / s M = 20, s = 5, md = 24 s3 = [3( )] / 5 s3 = -2.4 Negatively skewed

Try one M = 50, Mdn = 30, s = 7 s3 = [3(M - mdn)] / s

Putting it all together…
X 1 2 3 4 5 6 7 8 9 10 11 12 13 f 1 2 4 5 6 9 11 Putting it all together… Find Pearson’s coefficient of skew s3 = [3(M - mdn)] / s For this table s = 2.74

Homework: Chapter 4 1, 3, 4, 6, 8, 11, 12, 14, 19, 20, 23, 24, 25 Read IN THE LITERATURE pg Skim Chapter 6 pages ; section on Probability. ** BRING YOUR TEXT BOOKS TO CLASS TOMORROW**

Variability: Standard Deviation

Similar presentations

Presentation on theme: "Variability: Standard Deviation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Variability: Standard Deviation

Similar presentations

Presentation on theme: "Variability: Standard Deviation"— Presentation transcript:

Similar presentations

About project

Feedback