Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan.

Slides:



Advertisements
Similar presentations
Descriptive Statistics
Advertisements

Measures of Central Tendency.  Parentheses  Exponents  Multiplication or division  Addition or subtraction  *remember that signs form the skeleton.
Descriptive statistics. Statistics Many studies generate large numbers of data points, and to make sense of all that data, researchers use statistics.
Statistics.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Statistical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 1 Statistical Methods in Computer Science Descriptive Statistics Data 1: Frequency.
Chapter 13 Analyzing Quantitative data. LEVELS OF MEASUREMENT Nominal Measurement Ordinal Measurement Interval Measurement Ratio Measurement.
Chapter 14 Analyzing Quantitative Data. LEVELS OF MEASUREMENT Nominal Measurement Nominal Measurement Ordinal Measurement Ordinal Measurement Interval.
Intro to Descriptive Statistics
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Introduction to Educational Statistics
Measures of Central Tendency
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Descriptive Statistics PowerPoint Prepared by Alfred.
Central Tendency.
Data observation and Descriptive Statistics
Statistical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 1 Statistical Methods in Computer Science Data 1: Frequency Distributions Ido.
Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories Ordinal measurement Involves sorting objects.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Today: Central Tendency & Dispersion
Measures of Central Tendency CJ 526 Statistical Analysis in Criminal Justice.
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 2 Describing Data with Numerical Measurements
Summarizing Scores With Measures of Central Tendency
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
Central Tendency Quantitative Methods in HPELS 440:210.
Measures of Central Tendency or Measures of Location or Measures of Averages.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Basic Statistics. Scales of measurement Nominal The one that has names Ordinal Rank ordered Interval Equal differences in the scores Ratio Has a true.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Summary Statistics: Measures of Location and Dispersion.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive and Inferential Statistics Or How I Learned to Stop Worrying and Love My IA.
Chapter 2 Review Using graphs/tables/diagrams to show variable relationships Understand cumulative frequency, percentile rank, and cross-tabulations Perform.
Chapter 2 Describing and Presenting a Distribution of Scores.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Descriptive Statistics – Measures of Central Tendency.
Welcome to… The Exciting World of Descriptive Statistics in Educational Assessment!
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Measures of Central Tendency.  Number that best represents a group of scores  Mean  Median  Mode  Each gives different information about a group.
Univariate Statistics
Quantitative Methods in HPELS HPELS 6210
Chapter 3 Measures Of Central Tendency
Summarizing Scores With Measures of Central Tendency
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Myers Chapter 1 (F): Statistics in Psychological Research: Measures of Central Tendency A.P. Psychology.
Chapter 4  DESCRIPTIVE STATISTICS: MEASURES OF CENTRAL TENDENCY AND VARIABILITY Understanding Statistics for International Social Work and Other Behavioral.
Central Tendency & Variability
Presentation transcript:

Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan

Empirical Methods in Computer Science © 2006-now Gal Kaminka 2 Frequency Distributions and Scales

Empirical Methods in Computer Science © 2006-now Gal Kaminka 3 Characteristics of Distributions Shape, Central Tendency, Variability Different Central Tendency Different Variability

Empirical Methods in Computer Science © 2006-now Gal Kaminka 4 This Lesson Examine measures of central tendency Mode (Nominal) Median (Ordinal) Mean (Numerical) Examine measures of variability (dispersion) Entropy (Nominal) Variance (Numerical), Standard Deviation Standard scores (z-score)

Empirical Methods in Computer Science © 2006-now Gal Kaminka 5 Centrality/Variability Measures and Scales

Empirical Methods in Computer Science © 2006-now Gal Kaminka 6 The Mode (Mo) השכיח The mode of a variable is the value that is most frequent Mo = argmax f(x) For categorical variable: The category that appeared most For grouped data: The midpoint of the most frequent interval Under the assumption that values are evenly distributed in the interval

Empirical Methods in Computer Science © 2006-now Gal Kaminka 7 Finding the Mode: Example 1 The collection of values that a variable X took during the measurement ? Depends on Grouping

Empirical Methods in Computer Science © 2006-now Gal Kaminka 8 Finding the Mode: Example 2 The mode of a grouped frequency distribution depends on grouping

Empirical Methods in Computer Science © 2006-now Gal Kaminka 9 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18).

Empirical Methods in Computer Science © 2006-now Gal Kaminka 10 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ?

Empirical Methods in Computer Science © 2006-now Gal Kaminka 11 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = (¼ * 1.0)

Empirical Methods in Computer Science © 2006-now Gal Kaminka 12 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = (¼ * 1.0) between 7 and 8

Empirical Methods in Computer Science © 2006-now Gal Kaminka 13 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = (¼ * 1.0) 1 of four 8's

Empirical Methods in Computer Science © 2006-now Gal Kaminka 14 The Median (Mdn) החציון The median of a variable is its 50 th percentile, P 50. The point below which 50% of all measurements fall Requires ordering: Only ordinal and the numerical scales Examples: 0,8,8,11,15,16,20 ==> Mdn = 11 12,14,15,18,19,20 ==> Mdn = 16.5 (halfway between 15 and 18). 5,7,8,8,8,8 ==> Mdn = ? One method: Halfway between first and second 8, Mdn = 8 Another: Use linear interpolation as we did in intervals, Mdn = 7.75  7.75 = (¼ * 1.0) Width of interval containing 8's (real limits)

Empirical Methods in Computer Science © 2006-now Gal Kaminka 15 Arithmetic mean (mean, for short) Average is colloquial: Not precisely defined when used, so we avoid the term. The Arithmetic Mean ממוצע חשבוני

Empirical Methods in Computer Science © 2006-now Gal Kaminka 16 Properties of Central Tendency Measures Mo: Relatively unstable between samples Problematic in grouped distributions Can be more than one: Distributions that have more than one sometimes called multi-modal For uniform distributions, all values are possible modes Typically used only on nominal data

Empirical Methods in Computer Science © 2006-now Gal Kaminka 17 Properties of Central Tendency Measures Mean: Responsive to exact value of each score Only interval and ratio scales Takes total of scores into account: Does not ignore any value Sum of deviations from mean is always zero: Because of this: sensitive to outliers Presence/absence of scores at extreme values Stable between samples, and basis for many other statistical measures

Empirical Methods in Computer Science © 2006-now Gal Kaminka 18 Properties of Central Tendency Measures Median: Robust to extreme values Only cares about ordering, not magnitude of intervals Often used with skewed distributions Mo Mdn Mean

Empirical Methods in Computer Science © 2006-now Gal Kaminka 19 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean

Empirical Methods in Computer Science © 2006-now Gal Kaminka 20 Properties of Central Tendency Measures Contrasting Mode, Median, Mean Mo Mdn Mean