Central Tendency & Dispersion

Slides:

Advertisements

Similar presentations

Descriptive Measures MARE 250 Dr. Jason Turner.

Advertisements

Agricultural and Biological Statistics

Descriptive Statistics

Calculating & Reporting Healthcare Statistics

B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.

PSY 307 – Statistics for the Behavioral Sciences

Analysis of Research Data

Measures of Variability

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.

1 Measures of Central Tendency Greg C Elvers, Ph.D.

Measures of Central Tendency

Today: Central Tendency & Dispersion

Numerical Measures of Central Tendency. Central Tendency Measures of central tendency are used to display the idea of centralness for a data set. Most.

Measures of Central Tendency

Unit 3 Sections 3-2 – Day : Properties and Uses of Central Tendency The Mean  One computes the mean by using all the values of the data.  The.

Describing Data: Numerical

Chapter 3 Descriptive Measures

AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.

Describing distributions with numbers

Objective To understand measures of central tendency and use them to analyze data.

Frequency Distribution I. How Many People Made Each Possible Score? A. This is something I show you for each quiz.

BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.

Measurements of Central Tendency. Statistics vs Parameters Statistic: A characteristic or measure obtained by using the data values from a sample. Parameter:

Data Analysis and Statistics. When you have to interpret information, follow these steps: Understand the title of the graph Read the labels Analyze pictures.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Measures of Central Tendency or Measures of Location or Measures of Averages.

1.3 Psychology Statistics AP Psychology Mr. Loomis.

Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.

© Copyright McGraw-Hill CHAPTER 3 Data Description.

PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.

Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.

M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.

1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.

Describing distributions with numbers

Skewness & Kurtosis: Reference

An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.

1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.

Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.

INVESTIGATION 1.

Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Descriptive Statistics: Presenting and Describing Data.

Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)

Measures of Central Tendency: The Mean, Median, and Mode

Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.

1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.

LIS 570 Summarising and presenting data - Univariate analysis.

Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.

Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,

Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.

Descriptive Statistics(Summary and Variability measures)

Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.

Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.

Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.

CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.

Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.

Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.

Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.

Section 2.1 Visualizing Distributions: Shape, Center, and Spread.

Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.

Measures of Central Tendency

Descriptive Statistics: Presenting and Describing Data

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

STA 291 Spring 2008 Lecture 5 Dustin Lueker.

Numerical Descriptive Measures

MEASURES OF CENTRAL TENDENCY

MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.

Advanced Algebra Unit 1 Vocabulary

Presentation transcript:

Central Tendency & Dispersion Types of Distributions: Normal, Skewed Central Tendency: Mean, Median, Mode Dispersion: Variance, Standard Deviation This PowerPoint has been ripped off from I don’t know where, and improved upon by yours truly…Mrs. T  Enjoy!

What is the shape [of the distribution]? DESCRIPTIVE STATISTICS are concerned with describing the characteristics of frequency distributions Where is the center? What is the range? What is the shape [of the distribution]?

Frequency Table Test Scores What is the range of test scores? A: 30 (95 minus 65) When calculating mean, one must divide by what number? Observation Frequency (scores) (# occurrences) 65 1 70 2 75 3 80 4 85 3 90 2 95 1 A: 16 (total # occurrences)

Frequency Distributions 4 3 2 1 Frequency (# occurrences) 65 70 75 80 85 90 95 Test Score

Normally Distributed Curve

Voter Turnout in 50 States - 1980

Skewed Distributions We say the distribution is skewed to the left  (when the “tail” is to the left) We say the distribution is skewed to the right  (when the “tail” is to the right)

Voter Turnout in 50 States - 1940 Q: Is this distribution, positively or negatively skewed? A: Negatively Q: Would we say this distribution is skewed to the left or right? A: Left (skewed in direction of tail)

Characteristics - Normal Distribution It is symmetrical - half the values are to one side of the center (mean), and half the values are on the other side. The distribution is single-peaked, not bimodal or multi-modal. Most of the data values will be “bunched” near the center portion of the curve. As values become more extreme they become less frequent with the “outliers” being found at the “tails” of the distribution and are few in number. The Mean, Median, and Mode are the same in a perfectly symmetrical normal distribution. Percentage of values that occur in any range of the curve can be calculated using the Empirical Rule.

Empirical Rule

Summarizing Distributions Two key characteristics of a frequency distribution are especially important when summarizing data or when making a prediction: CENTRAL TENDENCY What is in the “middle”? What is most common? What would we use to predict? DISPERSION How spread out is the distribution? What shape is it?

The MEASURES of Central Tendency 3 measures of central tendency are commonly used in statistical analysis - MEAN, MEDIAN, and MODE. Each measure is designed to represent a “typical” value in the distribution. The choice of which measure to use depends on the shape of the distribution (whether normal or skewed).

Mean - Average Most common measure of central tendency. Is sensitive to the influence of a few extreme values (outliers), thus it is not always the most appropriate measure of central tendency. Best used for making predictions when a distribution is more or less normal (or symmetrical). Symbolized as: x for the mean of a sample μ for the mean of a population

Finding the Mean Formula for Mean: X = (Σ x) N Given the data set: {3, 5, 10, 4, 3} X = (3 + 5 + 10 + 4 + 3) = 25 5 5 X = 5

Find the Mean Q: 85, 87, 89, 91, 98, 100 A: 91.67 Median: 90 A: 78.3 (Extremely low score lowered the Mean) Median: 90 (The median remained unchanged.)

Median Used to find middle value (center) of a distribution. Used when one must determine whether the data values fall into either the upper 50% or lower 50% of a distribution. Used when one needs to report the typical value of a data set, ignoring the outliers (few extreme values in a data set). Example: median salary, median home prices in a market Is a better indicator of central tendency than mean when one has a skewed distribution.

To compute the median first you order the values of X from low to high:  85, 90, 94, 94, 95, 97, 97, 97, 97, 98 then count number of observations = 10. When the number of observations are even, average the two middle numbers to calculate the median. This example, 96 is the median (middle) score.

Median Find the Median 4 5 6 6 7 8 9 10 12 5 6 6 7 8 9 10 12 5 6 6 7 8 9 10 100,000

Mode Used when the most typical (common) value is desired. Often used with categorical data. The mode is not always unique. A distribution can have no mode, one mode, or more than one mode. When there are two modes, we say the distribution is bimodal. EXAMPLES: {1,0,5,9,12,8} - No mode {4,5,5,5,9,20,30} – mode = 5 {2,2,5,9,9,15} - bimodal, mode 2 and 9

Measures of Variability Central Tendency doesn’t tell us everything Dispersion/Deviation/Spread tells us a lot about how the data values are distributed. We are most interested in: Standard Deviation (σ) and Variance (σ2)

Why can’t the mean tell us everything? Mean describes the average outcome. The question becomes how good a representation of the distribution is the mean? How good is the mean as a description of central tendency -- or how accurate is the mean as a predictor? ANSWER -- it depends on the shape of the distribution. Is the distribution normal or skewed?

Dispersion Once you determine that the data of interest is normally distributed, ideally by producing a histogram of the values, the next question to ask is: How spread out are the values about the mean? Dispersion is a key concept in statistical thinking. The basic question being asked is how much do the values deviate from the Mean? The more “bunched up” around the mean the better your ability to make accurate predictions.

Means Consider these means for hours worked day each day: X = {7, 8, 6, 7, 7, 6, 8, 7} X = (7+8+6+7+7+6+8+7)/8 X = 7 Notice that all the data values are bunched near the mean. Thus, 7 would be a pretty good prediction of the average hrs. worked each day. X = {12, 2, 0, 14, 10, 9, 5, 4} X = (12+2+0+14+10+9+5+4)/8 X = 7 The mean is the same for this data set, but the data values are more spread out. So, 7 is not a good prediction of hrs. worked on average each day.

Data is more spread out, meaning it has greater variability. Below, the data is grouped closer to the center, less spread out, or smaller variability.

How well does the mean represent the values in a distribution? The logic here is to determine how much spread is in the values. How much do the values "deviate" from the mean? Think of the mean as the true value, or as your best guess. If every X were very close to the Mean, the Mean would be a very good predictor. If the distribution is very sharply peaked then the mean is a good measure of central tendency and if you were to use the Mean to make predictions you would be correct or very close much of the time.

What if scores are widely distributed? The mean is still your best measure and your best predictor, but your predictive power would be less. How do we describe this? Measures of variability Mean Absolute Deviation (You used in Math1) Variance (We use in Math 2) Standard Deviation (We use in Math 2)

Mean Absolute Deviation The key concept for describing normal distributions and making predictions from them is called deviation from the mean. We could just calculate the average distance between each observation and the mean. We must take the absolute value of the distance, otherwise they would just cancel out to zero! Formula:

Mean Absolute Deviation: An Example Data: X = {6, 10, 5, 4, 9, 8} X = 42 / 6 = 7 X – Xi Abs. Dev. 7 – 6 1 7 – 10 3 7 – 5 2 7 – 4 7 – 9 7 – 8 Compute X (Average) Compute X – X and take the Absolute Value to get Absolute Deviations Sum the Absolute Deviations Divide the sum of the absolute deviations by N Total: 12 12 / 6 = 2

What Does it Mean? Is it Really that Easy? On Average, each value is two units away from the mean. Is it Really that Easy? No! Absolute values are difficult to manipulate algebraically Absolute values cause enormous problems for calculus (Discontinuity) We need something else…

Variance and Standard Deviation Instead of taking the absolute value, we square the deviations from the mean. This yields a positive value. This will result in measures we call the Variance and the Standard Deviation Sample - Population - s Standard Deviation σ Standard Deviation s2 Variance σ2 Variance

Calculating the Variance and/or Standard Deviation Formulae: Variance: Examples Follow . . . Standard Deviation:

Example: Mean: 6 10 5 4 9 8 -1 1 3 9 -2 4 -3 2 Variance: Data: X = {6, 10, 5, 4, 9, 8}; N = 6 Mean: 6 10 5 4 9 8 -1 1 3 9 -2 4 -3 2 Variance: Standard Deviation: Total: 42 Total: 28