Download presentation
Presentation is loading. Please wait.
1
Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan
2
Empirical Methods in Computer Science © 2006-now Gal Kaminka 2 Frequency Distributions and Scales
3
Empirical Methods in Computer Science © 2006-now Gal Kaminka 3 Characteristics of Distributions Shape, Central Tendency, Variability Different Central Tendency Different Variability
4
Empirical Methods in Computer Science © 2006-now Gal Kaminka 4 This Lesson Examine measures of central tendency Mode (Nominal) Median (Ordinal) Mean (Numerical) Examine measures of variability (dispersion) Entropy (Nominal) Variance (Numerical), Standard Deviation Standard scores (z-score)
5
Empirical Methods in Computer Science © 2006-now Gal Kaminka 5 Centrality/Variability Measures and Scales
6
Empirical Methods in Computer Science © 2006-now Gal Kaminka 6 The Mode (Mo) השכיח The mode of a variable is the value that is most frequent Mo = argmax f(x) For categorical variable: The category that appeared most For grouped data: The midpoint of the most frequent interval Under the assumption that values are evenly distributed in the interval
7
Empirical Methods in Computer Science © 2006-now Gal Kaminka 7 Finding the Mode: Example 1 The collection of values that a variable X took during the measurement ? Depends on Grouping
8
Empirical Methods in Computer Science © 2006-now Gal Kaminka 8 Finding the Mode: Example 2 The mode of a grouped frequency distribution depends on grouping 86 88 87
9
Empirical Methods in Computer Science © 2006-now Gal Kaminka 9 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18).
10
Empirical Methods in Computer Science © 2006-now Gal Kaminka 10 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ?
11
Empirical Methods in Computer Science © 2006-now Gal Kaminka 11 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0)
12
Empirical Methods in Computer Science © 2006-now Gal Kaminka 12 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) between 7 and 8
13
Empirical Methods in Computer Science © 2006-now Gal Kaminka 13 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) 1 of four 8's
14
Empirical Methods in Computer Science © 2006-now Gal Kaminka 14 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75 7.75 = 7.5 + (¼ * 1.0) Width of interval containing 8's (real limits)
15
Empirical Methods in Computer Science © 2006-now Gal Kaminka 15 Arithmetic mean (mean, for short) Average is colloquial: Not precisely defined when used, so we avoid the term. The Arithmetic Mean ממוצע חשבוני
16
Empirical Methods in Computer Science © 2006-now Gal Kaminka 16 Properties of Central Tendency Measures Mo: Relatively unstable between samples Problematic in grouped distributions Can be more than one: Distributions that have more than one sometimes called multi-modal For uniform distributions, all values are possible modes Typically used only on nominal data
17
Empirical Methods in Computer Science © 2006-now Gal Kaminka 17 Properties of Central Tendency Measures Mean: Responsive to exact value of each score Only interval and ratio scales Takes total of scores into account: Does not ignore any value Sum of deviations from mean is always zero: Because of this: sensitive to outliers Presence/absence of scores at extreme values Stable between samples, and basis for many other statistical measures
18
Empirical Methods in Computer Science © 2006-now Gal Kaminka 18 Properties of Central Tendency Measures Median: Robust to extreme values Only cares about ordering, not magnitude of intervals Often used with skewed distributions Mo Mdn Mean
19
Empirical Methods in Computer Science © 2006-now Gal Kaminka 19 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean
20
Empirical Methods in Computer Science © 2006-now Gal Kaminka 20 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.