Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan.

Similar presentations


Presentation on theme: "Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan."— Presentation transcript:

1 Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan

2 Empirical Methods in Computer Science © 2006-now Gal Kaminka 2 Frequency Distributions and Scales

3 Empirical Methods in Computer Science © 2006-now Gal Kaminka 3 Characteristics of Distributions Shape, Central Tendency, Variability Different Central Tendency Different Variability

4 Empirical Methods in Computer Science © 2006-now Gal Kaminka 4 This Lesson Examine measures of central tendency Mode (Nominal) Median (Ordinal) Mean (Numerical) Examine measures of variability (dispersion) Entropy (Nominal) Variance (Numerical), Standard Deviation Standard scores (z-score)

5 Empirical Methods in Computer Science © 2006-now Gal Kaminka 5 Centrality/Variability Measures and Scales

6 Empirical Methods in Computer Science © 2006-now Gal Kaminka 6 The Mode (Mo) השכיח The mode of a variable is the value that is most frequent Mo = argmax f(x) For categorical variable: The category that appeared most For grouped data: The midpoint of the most frequent interval Under the assumption that values are evenly distributed in the interval

7 Empirical Methods in Computer Science © 2006-now Gal Kaminka 7 Finding the Mode: Example 1 The collection of values that a variable X took during the measurement ? Depends on Grouping

8 Empirical Methods in Computer Science © 2006-now Gal Kaminka 8 Finding the Mode: Example 2 The mode of a grouped frequency distribution depends on grouping 86 88 87

9 Empirical Methods in Computer Science © 2006-now Gal Kaminka 9 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18).

10 Empirical Methods in Computer Science © 2006-now Gal Kaminka 10 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ?

11 Empirical Methods in Computer Science © 2006-now Gal Kaminka 11 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = 7.5 + (¼ * 1.0)

12 Empirical Methods in Computer Science © 2006-now Gal Kaminka 12 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = 7.5 + (¼ * 1.0) between 7 and 8

13 Empirical Methods in Computer Science © 2006-now Gal Kaminka 13 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = 7.5 + (¼ * 1.0) 1 of four 8's

14 Empirical Methods in Computer Science © 2006-now Gal Kaminka 14 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = 7.5 + (¼ * 1.0) Width of interval containing 8's (real limits)

15 Empirical Methods in Computer Science © 2006-now Gal Kaminka 15 Arithmetic mean (mean, for short) Average is colloquial: Not precisely defined when used, so we avoid the term. The Arithmetic Mean ממוצע חשבוני

16 Empirical Methods in Computer Science © 2006-now Gal Kaminka 16 Properties of Central Tendency Measures Mo: Relatively unstable between samples Problematic in grouped distributions Can be more than one: Distributions that have more than one sometimes called multi-modal For uniform distributions, all values are possible modes Typically used only on nominal data

17 Empirical Methods in Computer Science © 2006-now Gal Kaminka 17 Properties of Central Tendency Measures Mean: Responsive to exact value of each score Only interval and ratio scales Takes total of scores into account: Does not ignore any value Sum of deviations from mean is always zero: Because of this: sensitive to outliers Presence/absence of scores at extreme values Stable between samples, and basis for many other statistical measures

18 Empirical Methods in Computer Science © 2006-now Gal Kaminka 18 Properties of Central Tendency Measures Median: Robust to extreme values Only cares about ordering, not magnitude of intervals Often used with skewed distributions Mo Mdn Mean

19 Empirical Methods in Computer Science © 2006-now Gal Kaminka 19 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean

20 Empirical Methods in Computer Science © 2006-now Gal Kaminka 20 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean


Download ppt "Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan."

Similar presentations


Ads by Google