Descriptive Statistics: Overview Measures of Center Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance Standard deviation * * Measures of Position Percentile Deviation Score Z-score * *
Central tendency Seeks to provide a single value that best represents a distribution
Central tendency
Seeks to provide a single value that best represents a distribution Typical measures are –mode –median –mean
Mode the most frequently occurring score value corresponds to the highest point on the frequency distribution For a given sample N=16: The mode = 39
Mode The mode is not sensitive to extreme scores. For a given sample N=16: The mode = 39
Mode a distribution may have more than one mode For a given sample N=16: The modes = 35 and 39
Mode there may be no unique mode, as in the case of a rectangular distribution For a given sample N=16: No unique mode
Median the score value that cuts the distribution in half (the “middle” score) 50th percentile For N = 15 the median is the eighth score = 37
Median For N = 16 the median is the average of the eighth and ninth scores = 37.5
Mean this is what people usually have in mind when they say “average” the sum of the scores divided by the number of scores Changing the value of a single score may not affect the mode or median, but it will affect the mean. For a population:For a sample:
Mean X=7.07 In many cases the mean is the preferred measure of central tendency, both as a description of the data and as an estimate of the parameter. __ In order for the mean to be meaningful, the variable of interest must be measures on an interval scale Buddhist Protestant Catholic Jewish Muslim Score Frequency X=2.4 __
Mean The mean is sensitive to extreme scores and is appropriate for more symmetrical distributions. X=36.8 __ X=36.5 __ X=93.2 __
a symmetrical distribution exhibits no skewness in a symmetrical distribution the Mean = Median = Mode Symmetry
Skewness refers to the asymmetry of the distribution Skewed distributions A positively skewed distribution is asymmetrical and points in the positive direction. Mode = 70,000$ Median = 88,700$ Mean = 93,600$ modemean median mode < median < mean
A negatively skewed distribution Skewed distributions mode > median > mean mode mean median
Measures of central tendency +- Mode quick & easy to compute useful for nominal data poor sampling stability Median not affected by extreme scoressomewhat poor sampling stability Mean sampling stability related to variance inappropriate for discrete data affected by skewed distributions
Distributions Center: mode, median, mean Shape: symmetrical, skewed Spread
Measures of Spread the dispersion of scores from the center a distribution of scores is highly variable if the scores differ wildly from one another Three statistics to measure variability –range –interquartile range –variance
Range largest score minus the smallest score these two have same range (80) but spreads look different says nothing about how scores vary around the center greatly affected by extreme scores (defined by them)
Interquartile range the distance between the 25th percentile and the 75th percentile Q3-Q1 = = 40 Q3-Q1 = = 5 effectively ignores the top and bottom quarters, so extreme scores are not influential dismisses 50% of the distribution
Deviation measures Might be better to see how much scores differ from the center of the distribution -- using distance Scores further from the mean have higher deviation scores ScoreDeviation Amy10-40 Theo20-30 Max30-20 Henry40-10 Leticia500 Charlotte6010 Pedro7020 Tricia8030 Lulu9040 AVERAGE50
Deviation measures To see how ‘deviant’ the distribution is relative to another, we could sum these scores But this would leave us with a big fat zero ScoreDeviation Amy10-40 Theo20-30 Max30-20 Henry40-10 Leticia500 Charlotte6010 Pedro7020 Tricia8030 Lulu9040 SUM0
Deviation measures So we use squared deviations from the mean ScoreDeviation Sq. Deviation Amy Theo Max Henry Leticia5000 Charlotte Pedro Tricia Lulu SUM06000 This is the sum of squares (SS) SS= ∑(X-X) 2 __
Variance We take the “average” squared deviation from the mean and call it VARIANCE (to correct for the fact that sample variance tends to underestimate pop variance) For a population: For a sample:
Variance 1.Find the mean. 2.Subtract the mean from every score. 3.Square the deviations. 4.Sum the squared deviations. 5.Divide the SS by N or N-1. ScoreDev’nSq. Dev. Amy Theo Max Henry Leticia5000 Charlotte Pedro Tricia Lulu SUM /8 =750
The standard deviation is the square root of the variance The standard deviation measures spread in the original units of measurement, while the variance does so in units squared. Variance is good for inferential stats. Standard deviation is nice for descriptive stats. Standard deviation
Example N = 28 X = 50 s 2 = s = N = 28 X = 50 s 2 = s = 11.86
Descriptive Statistics: Quick Review Measures of Center Mode Median Mean ** Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance Standard deviation ** **
Descriptive Statistics: Quick Review For a population:For a sample: MeanVariance Standard Deviation
Treat this little distribution as a sample and calculate: –Mode, median, mean –Range, variance, standard deviation Exercise
Descriptive Statistics: Overview Measures of Center Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance Standard deviation * * Measures of Position Percentile Deviation Score Z-score * *
Measures of Position How to describe a data point in relation to its distribution
Quantile Deviation Score Z-score Measures of Position
Quantiles Quartile Divides ranked scores into four equal parts 25% (minimum)(maximum) (median)
Quantiles 10% Divides ranked scores into ten equal parts Decile
Quantiles Divides ranked scores into 100 equal parts Percentile rank of score x = 100 number of scores less than x total number of scores Percentile rank
Deviation Scores ScoreDeviation Amy10-40 Theo20-30 Max30-20 Henry40-10 Leticia500 Charlotte6010 Pedro7020 Tricia8030 Lulu9040 Average50 For a population: For a sample:
What if we want to compare scores from distributions that have different means and standard deviations? Example –Nine students scores on two different tests –Tests scored on different scales
Nine Students on Two Tests Test 1Test 2 Amy101 Theo202 Max303 Henry404 Leticia505 Charlotte606 Pedro707 Tricia808 Lulu909 Average505
Nine Students on Two Tests Test 1Test 2 Deviation Score 1 Deviation Score 2 Amy Theo Max Henry Leticia50500 Charlotte Pedro Tricia Lulu Average505
Z-Scores Z-scores modify a distribution so that it is centered on 0 with a standard deviation of 1 Subtract the mean from a score, then divide by the standard deviation For a population:For a sample:
Z-Scores Test 1Test 2Z- Score 1Z-Score 2 Amy Theo Max Henry Leticia50500 Charlotte Pedro Tricia Lulu Average50500 St Dev
A distribution of Z-scores… Z-Scores Always has a mean of zero Always has a standard deviation of 1 Converting to standard or z scores does not change the shape of the distribution: z scores cannot normalize a non-normal distribution A Z-score is interpreted as “number of standard deviations above/below the mean”
Exercise Test 3Z-Score Amy52 Theo39 Max-1.5 Henry1.3 On their third test, the class average was 45 and the standard deviation was 6. Fill in the rest.
Descriptive Statistics: Quick Review For a population:For a sample: Mean Variance Z-score Standard Deviation
:If you add or subtract a constant from each value in a distribution, then the mean is increased/decreased by that amount the standard deviation is unchanged the z-scores are unchanged 6 If you multiply or divide each value in a distribution by a constant, then the mean is multiplied/divided by that amount the standard deviation is multiplied/divided by that amount the z-scores are unchanged Messing with Units
Example ScoreDev’sSq devZ-score Theo Max Henry 51.5 Leticia Charlotte Pedro 824 Tricia Lulu MEAN 6 STDEV 1.94
Adding 1 ScoreDev’sSq devZ-score Theo Max Henry 61.5 Leticia Charlotte Pedro 924 Tricia Lulu MEAN 7 STDEV 1.94
Example ScoreDev’sSq devZ-score Theo Max Henry 51.5 Leticia Charlotte Pedro 824 Tricia Lulu MEAN 6 STDEV 1.94
Multiplying by 10 ScoreDev’sSq devZ-score Theo Max Henry Leticia Charlotte Pedro Tricia Lulu MEAN 60 STDEV 19.4
Other Standardized Distributions The Z distribution is not the only standardized distribution. You can easily create others (it’s just messing with units, really).
Score Theo5 Max3 Henry5 Leticia7 Charlotte7 Pedro8 Tricia4 Lulu9 Average6 St Dev1.94 Example: Let’s change these test scores into ETS type scores (mean 500, stdev 100) Other Standardized Distributions
ScoreZ-Score ETS type score Theo Max Henry Leticia Charlotte Pedro4400 Tricia Lulu Average60500 St Dev Here’s How: Convert to Z scores Multiply by 100 to increase the st dev Add 500 to increase the mean Other Standardized Distributions
Exercise ScorePercentile Deviation ScoreZ-Score IQ type score (Mean 100 Stdev 10) Theo20 Max18 Henry13 Leticia17 Charlotte19 Pedro16 Tricia11 Lulu9