Variability: Standard Deviation

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Measures of Dispersion
Statistics for the Social Sciences
The standard error of the sample mean and confidence intervals
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 3 Chicago School of Professional Psychology.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.
1.2: Describing Distributions
VARIABILITY. PREVIEW PREVIEW Figure 4.1 the statistical mode for defining abnormal behavior. The distribution of behavior scores for the entire population.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Descriptive Statistics: Overview Measures of Center Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance.
Variability Ibrahim Altubasi, PT, PhD The University of Jordan.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Intra-Individual Variability Intra-individual variability is greater among older adults (Morse 1993) –May be an indicator of the functioning of the central.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Formula Compute a standard deviation with the Raw-Score Method Previously learned the deviation formula Good to see “what's going on” Raw score formula.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapter 3 Averages and Variations
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Variability. Statistics means never having to say you're certain. Statistics - Chapter 42.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Chapter 5 Measures of Variability. 2 Measures of Variability Major Points The general problem The general problem Range and related statistics Range and.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Variability Pick up little assignments from Wed. class.
Chapter 3 Looking at Data: Distributions Chapter Three
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
Introduction to statistics I Sophia King Rm. P24 HWB
Today: Standard Deviations & Z-Scores Any questions from last time?
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 4-6 Peer Tutor Slides Instructor: Mr. Ethan W. Cooper, Lead Tutor © 2013.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
CHAPTER 4 NUMERICAL METHODS FOR DESCRIBING DATA What trends can be determined from individual data sets?
One-Variable Statistics. Descriptive statistics that analyze one characteristic of one sample  Where’s the middle?  How spread out is it?  How do different.
Descriptive Statistics ( )
One-Variable Statistics
Descriptive Statistics (Part 2)
Descriptive Statistics: Overview
Reasoning in Psychology Using Statistics
4.3 Measures of Variation LEARNING GOAL
Measures of Location Statistics of location Statistics of dispersion
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Presentation transcript:

Variability: Standard Deviation Lecture 4 Variability: Standard Deviation

Variability Reminder - How spread out the scores are…Range - How does the range of each of these distributions vary? Or the Interquartile range? Measure of error - is our sample similar to the population OR is an individual score representative of its sample

Standard Deviation Standard deviation - the average distance on either side of the mean. Goal of the SD is to measure the standard or typical distance from the mean. But it’s not practical with large N, so we need to estimate the variance and standard deviation using equations Mean = 70.8 Ben is 66 in. tall. His deviation from the mean is -4.8. James is 75 in. tall. His deviation from the mean is 4.2

Standard Deviation How much scores typically vary around the mean; a measure of dispersion Usually 1/5 - 1/6 of the range Based on the mean, therefore: Requires at least interval data Sensitive to outliers accounts for all scores in a distribution M f 1 2 3 4 5 6 7 9 8

Logic of the Standard Deviation: Let’s start by looking at the population Step 1: Find the Deviation for each score from the mean. X - . Be sure to include both the sign (+/-) and the number. X -  X * Notice that the sum of the deviations = 0. This reflects the fact that the mean is a balancing point * Bonus - you can use this fact to check yourselves 65 -14 90 +11 84 +5 Keep the goal in mind…standard or typical distance from the mean 76 -3 81 +2 98 +19 82 +3 56 -23  = 79

Square of each score and sum them = Sum of Squared Deviations = SS Step 2 - Remember the standard deviation is the average of the deviations, but this won’t work because the sum of our deviations = 0 Solution = get rid of the signs (+/-) Square each score Square of each score and sum them = Sum of Squared Deviations = SS X -  (X – )2 X 65 -14.4 207.4 90 10.6 112.4 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 346.0 82 2.6 6.8 59 -20.4 416.2 * Sum of Squared Deviations = SS X = 79.4 1123.9

Step 3 - Calculate the mean squared deviation = SS / N This value is called the variance and is represented with the symbol MS or 2 . Variance will be important for use in inferential stats methods, but it isn’t the best descriptive stat. -- it’s hard to visualize variability with the variance alone. X -  (X – )2 X 65 -14.4 207.4 90 10.6 112.4 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 MS = 1123.9 / 8 = 140.5 98 18.6 346.0 82 2.6 6.8 59 -20.4 416.2 X = 79.4 1123.9 * Sum of Squared Deviations = SS

Step 4: Correct for having squared all the deviations because we want a value that easily corresponds to the mean that we can visualize: Standard deviation = variance X -  (X – )2 X 65 -14.4 90 10.6 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 82 6.8 59 -20.4 X = 79.4 140.5 = 11.9 207.4 Standard deviation = the square root of the mean squared deviation Conceptually the average distance from the mean: on average a random point pulled from this distribution will be 11.9 away from the mean. 112.4 346.0 416.2 1123.9

Putting it Together  = 11.9 What can we say about a score that lies 12 points from the mean, 91 points? X -  (X – )2 X 65 -14.4 90 10.6 84 4.6 21.2 76 -3.4 11.6 81 1.6 2.6 98 18.6 82 6.8 59 -20.4 X = 79.4 207.4 112.4 What about a score that lies 30 points from the mean, 49 points? 346.0 416.2 1123.9

Population Standard Deviation REVIEW: variance = mean squared deviation = greek lower case letter sigma 2 = SS / N Standard deviation =  = SS/ N Computing SS: Definitional formula: SS = (X - )2 Shows exactly how scores vary about the mean (like we just did). Works best on whole numbers. Computational formula: SS = X2 - [ (X)2 / N] Easier for calculations because it works directly with the scores, but less intuitive about the mean.

Formulas for Pop. SD and Variance Variance = SS / N (mean squared deviation) Standard deviation = SS/N Denoted by Greek letters  and 2

Let’s Do It Together X X -  (X - )2 X2 (X)2 2  24 28 32 33 48 64 42 38 67 55 455 -17.4 -13.4 -9.4 -8.4 6.6 22.6 0.6 -3.4 25.6 13.6 302.8 179.6 88.4 70.6 43.6 510.8 .36 11.6 655.4 185 2351 576 784 1024 1089 2304 4096 1764 1444 4489 3025 21171 207025 213.7 14.6 Definitional:SS = (X - )2 Computational: SS = X2 - [ (X)2 / N]

Another Example… Find  for the following sets of numbers X X2 (X)2 2  10 15 17 21 24 31 13 Definitional:SS = (X - )2 Computational: SS = X2 - [ (X)2 / N]

Samples vs. Populations Rationale: Inferential statistics rely on samples to draw general conclusions about the population. PROBLEM - sample variability tends to be less than population variability. Thus, this variability is biased. That is, it underestimates the pop. variability. pop. variability x x x x x x sample variability

Terms Biased - a sample statistic is said to be biased if on the average the sample statistic consistently underestimates or overestimates the population parameter. Unbiased - a sample statistic is said to be unbiased if on average the sample statistics is equal to the population parameter

An Analogy for a Biased Stat Imagine you were interested in studying learning in elementary school children. What if you chose as your sample child geniuses from computer and science camp? Could you generalize from your sample to the population of elementary school children? A sample statistic for SD will be biased even with a representative sample - We have to perform a correction

Samples: s and  Changes in notation to reflect a sample: So to calculate SS (same as for pop.): (1) Find deviation: X - M (2) Squared each deviation: (X - M)2 (3) Sum squared devations: SS = (X - M)2 Correcting for the bias is done in the calculation for the mean square deviation or variance: Sample variance - s2 = SS / (n - 1) Sample standard deviation = s = SS / (n - 1) or s = s2

Let’s Do it Together X X2 f 1 2 3 4 5 6 7 9 8 X 4 5 6 7 8 9 98 16 25 36 49 64 81 714 The smallest distance from the mean is 1 and the largest distance is 3, so the SD should be somewhere in between. SS = 714 - (982 / 14) = 28 * NOTE: do not correct for bias in SS S2 or MS = SS / (n-1) S2 or MS = 28 / 13 = 2.2 S = 2.2 = 1.5 SS = X2 - [ (X)2 / n]

Start Easy: Find s X = 5, 1, 5, 5 X = 1, 7, 1, 1 SS = X2 - [ (X)2 / n] NOTE: do not correct for bias in SS S2 or MS = SS / (n-1) S = S2

A little more complex SS = 1698474.01 - (26005920.2 / 16) SS = X2 - [ (X)2 / n] MS or S2 = SS / n-1 s = SS / (n - 1) SS = 1698474.01 - (26005920.2 / 16) MS = 73104 / 15 s = 69.8 5099.6 1698474.01

Sample Variability and Degrees of Freedom: Why do we correct with n-1? (1) the deviations computed from a sample are not “real” deviations. Sampling error - sample and pop. are close, but not exact. SS is smaller for the sample - math. proof Using a sample mean places a restriction on the variability X X -  (X - )2 X X - M (X - M)2 12 8 10 +4 +2 16 4 SS = 17 Where  = 8 SS = 12 Where M = 10

More about n -1 Sample mean is known before deviations and SS can be computed. Sample of n=3 with a M=10. Therefore, as soon as the first two values are given X = 12, 8 you know the last value is 10. n-1 scores can vary; the last score is not free to vary X X -  (X - )2 X X - M (X - M)2 12 8 10 +4 +2 16 4 SS = 17 Where  = 8 SS = 12 Where M = 10

Degrees of Freedom df commonly encountered as n - 1, where n is the number of scores in the sample Refers to the number of scores in a distribution that are free to vary once the M & n are set Example {5, 10, 15}; n = 3; M = 10 How many scores could you change and still have n = 3 & M = 10? n = 1 or 2 So, s2 = SS / n-1 = SS / df

Cafeteria degrees of freedom: An analogy You are 4th in line at the cafeteria to choose your dessert. The choices are a cheesecake, a piece of fruit, pumpkin pie, and a stale cookie. The first person chooses the cheescake Next to go is the apple Then the pumpkin pie The last choice is restricted and can’t vary. You are stuck with the stale cookie

Degrees of Freedom Why n - 1? Because you are estimating the  from M. Once this is done, the estimate is fixed & cannot be changed. Therefore, you can only vary N - 1 scores with this fixed value This is the case whenever we are estimating a parameter from a statistic.

A little more about biased stats Population N=6 (0, 0, 3, 3, 9, 9)  = 4, 2 =14 Take all possible n = 2 samples 36 63 126

Properties of the Standard Deviation Distribution: Homogeneous sample: data values are very similar = small s2 and s. Heterogeneous sample: data values are dissimilar = big s2 and s. Helps make predictions about the amount of error in your sample. How close is your sample to the population

Properties of the Standard Deviation Transforming scores: Adding or subtracting a constant does not change the SD f 1 2 3 4 5 6 7 9 8 13 11 Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant e.g. you and a friend compare scores on an exam your friend earned a 85 and you earned a 90. Later you find out that a 5 point curve was added to everyone’s score.

Properties of the Standard Deviation Transforming scores: Multiplying or dividing by a constant changes SD by that amount f f 1 10 1 2 3 4 10 20 30 40 Another way to determine if the SD is affected by a constant is to pick any two scores and calculate the distance between the two both before and after the constant

Factors that affect Variability Extreme Scores: Range is most affected SD and variance somewhat affected SIR not affected Sample Size: Range is directly related to sample size. This is unacceptable. SD, variance, and SIR unaffected by sample size Open-ended Distributions: Cannot computer range, SD, or variance SIR is your only option

Relationship with other Statistics SD is derived using information about the mean (distances) - the two go hand-in-hand Interquartile range (& SIR) are based on percentiles, so is the median (mdn is 50th percentile) Range has no direct relationship with any other statistical measures

Why we need to know this information Variability influences how easy it is to see patterns in our data…. Estimate M for each sample Sample 1 Sample 2 X 34 35 36 X 26 10 64 40

Why we need to know this information Keep the goal in mind: Research uses samples to deduce information about the population Consider the data from two experiments and determine whether or not there appears to be a consistent difference f 5 10 15 20 25 30 35 40 45 50 60 f Experiment 1 Experiment 2 Talk therapy = M = 20 Meditation = M = 40

Graphical Representation of  =1.58 f 1 2 3 4 5 6 7 9 8

Graphic Representation - Box Plots Also called box-and-whisker plots Useful for comparing distributions displaying variability Box defines the interquartile range Top line defines the third quartile Bottom line defines the first quartile Whiskers extend out to the highest and lowest scores Median is often displayed by a line

Graphic Representation - Boxplots

Pearson’s Coefficient of Skew Pearson’s coefficient of skew tells us if a distribution is positive or negatively skewed and how much (+/- 0.5 is approximately symmetric/normal) s3 = [3(M - mdn)] / s M = 20, s = 5, md = 24 s3 = [3(20 - 24)] / 5 s3 = -2.4 Negatively skewed

Try one M = 50, Mdn = 30, s = 7 s3 = [3(M - mdn)] / s

Putting it all together… X 1 2 3 4 5 6 7 8 9 10 11 12 13 f 1 2 4 5 6 9 11 Putting it all together… Find Pearson’s coefficient of skew s3 = [3(M - mdn)] / s For this table s = 2.74

Homework: Chapter 4 1, 3, 4, 6, 8, 11, 12, 14, 19, 20, 23, 24, 25 Read IN THE LITERATURE pg 122-123. Skim Chapter 6 pages 161 - 166; section on Probability. ** BRING YOUR TEXT BOOKS TO CLASS TOMORROW**