Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Advertisements

SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Calculating & Reporting Healthcare Statistics
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Statistics Psych 231: Research Methods in Psychology.
Biostatistics Unit 2 Descriptive Biostatistics 1.
Measures of Variability
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.
Central Tendency and Variability
MEASURES OF CENTRAL TENDENCY & DISPERSION Research Methods.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
Measures of Central Tendency
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Initial Data Analysis Central Tendency. Notation  When we describe a set of data corresponding to the values of some variable, we will refer to that.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Lecture 3 A Brief Review of Some Important Statistical Concepts.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
8.3 Measures of Dispersion  In this section, you will study measures of variability of data. In addition to being able to find measures of central tendency.
Psyc 235: Introduction to Statistics Lecture Format New Content/Conceptual Info Questions & Work through problems.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
SECTION 12-3 Measures of Dispersion Slide
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Measures of Dispersion
According to researchers, the average American guy is 31 years old, 5 feet 10 inches, 172 pounds, works 6.1 hours daily, and sleeps 7.7 hours. These numbers.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Practice Page 65 –2.1 Positive Skew Note Slides online.
Sociology 5811: Lecture 2: Datasets and Simple Descriptive Statistics Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Objectives The student will be able to:  use Sigma Notation  find the mean absolute deviation of a data set SOL: A
Chapter 3: Averages and Variation Section 2: Measures of Dispersion.
Central Tendency & Dispersion
Measures of variability: understanding the complexity of natural phenomena.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.
Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Chapter Three McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Describing Data: Numerical Measures.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
Chapter 11 Data Descriptions and Probability Distributions Section 3 Measures of Dispersion.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Applied Quantitative Analysis and Practices LECTURE#07 By Dr. Osman Sadiq Paracha.
Measures of Central Tendency: Averages or other measures of “location” that find a single number that reflects the middle of the distribution of scores—
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Measures of Dispersion Measures of Variability
Shoe Size  Please write your shoe size on the board.  Girls put yours on the girl’s chart  Boys put yours on the boy’s chart.
Central Tendency Quartiles and Percentiles (الربيعيات والمئينات)
One-Variable Statistics
Practice Page Practice Page Positive Skew.
Reasoning in Psychology Using Statistics
Central Tendency and Variability
Summary descriptive statistics: means and standard deviations:
Numerical Descriptive Measures
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
Numerical Descriptive Measures
Presentation transcript:

Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements First math problem set will be handed out in Lab on Monday… Due September 20 Today’s Class: The Mean (and relevant mathematical notation) Measures of Dispersion

Review: Variables / Notation Each column of a dataset is considered a variable We’ll refer to a column generically as “Y” Person# Guns owned The variable “Y” Note: The total number of cases in the dataset is referred to as “N”. Here, N=5.

Equation of Mean: Notation Each case can be identified a subscript Y i represents “ith” case of variable Y i goes from 1 to N Y 1 = value of Y for first case in spreadsheet Y 2 = value for second case, etc. Y N = value for last case Person# Guns owned (Y) 1Y 1 = 0 2Y 2 = 3 3Y 3 = 0 4Y 4 = 1 5Y 5 = 1

Calculating the Mean Equation: 1. Mean of variable Y represented by Y with a line on top – called “Y-bar” 2. Equals sign means equals: “is calculated by the following…” 3. N refers to the total number of cases for which there is data Summation (  ) – will be explained next…

Equation of Mean: Summation Sigma (Σ): Summation –Indicates that you should add up a series of numbers The thing on the right is the item to be added repeatedly The things on top and bottom tell you how many times to add up Y-sub-i… AND what numbers to substitute for i.

Equation of Mean: Summation 1. Start with bottom: i = 1. –The first number to add is Y-sub-1 2. Then, allow i to increase by 1 –The second number to add is i = 2, then i = 3 3. Keep adding numbers until i = N –In this case N=5, so stop at 5

Equation of the Mean: Example 2 Can you calculate mean for gun ownership? Person# Guns owned (Y) 1Y 1 = 0 2Y 2 = 3 3Y 3 = 0 4Y 4 = 1 5Y 5 = 1 Answer:

Properties of the Mean The mean takes into account the value of every case to determine what is “typical” –In contrast to the the mode & median –Probably the most commonly used measure of “central tendency” But, it is often good to look at median & mode also! Disadvantages –Every case influences outcome… even unusual ones –Extreme cases affect results a lot –The mean doesn’t give you any information on the shape of the distribution Cases could be very spread out, or very tightly clustered

The Mean and Extreme Values CaseNum CD’sNum CD’s Mean Extreme values affect the mean a lot: Changing this one case really affects the mean a lot

Example 1 And, very different groups can have the same mean:

Example 2

Example 3

Interpreting Dispersion Question: What are possible social interpretations of the different distributions (all with the same mean)? Example 1: Individuals cluster around 100 Example 2: Individuals distributed sporadically over range Example 3: Individuals in two groups – near zero and near 200

Measures of Dispersion Remember: Goal is to understand your variable… Center of the distribution is only part of the story Important issue: How “spread out” are the cases around the mean? –How “dispersed”, “varied” are your cases? –Are most cases like the “typical” case? Or not?

Measures of Dispersion Some measures of dispersion: 1. Range –Also related: Minimum and Maximum 2. Average Absolute deviation 3. Variance 4. Standard deviation

Minimum and Maximum Minimum: the lowest value of a variable represented in your data Maximum: the highest value of a variable represented in your data Example: In previous histograms about number of CDs owned, the minimum was 0, the maximum was 200.

The Range The Range is calculated as the maximum minus the minimum –In case of CD ownership, = 200 Advantage: –Easy Disadvantage: –1. Easily influenced by extreme values… may not be representative –2. Doesn’t tell you anything about the middle cases

The Idea of Deviation Deviation: How much a particular case differs from the mean of all cases Deviation of zero indicates the case has the same value as the mean of all cases –Positive deviation: case has higher value than mean –Negative deviation: case has lower value than mean Extreme positive/negative indicates cases further from mean.

Deviation of a Case Formula: Literally, it is the distance from the mean (Y-bar)

Deviation Example CaseNum CD’sDeviation from mean (32.5)

Turning the Deviation into a Useful Measure of Dispersion Idea #1: Add it all up –The sum of deviation for all cases: What is sum of the following? -12.5, 7.5, -32.5, 37.5 Problem: Sum of deviation is always zero –Because mean is the exact center of all cases –Cases equally deviate positively and negatively –Conclusion: You can’t measure dispersion this way

Turning the Deviation into a Useful Measure of Dispersion Idea #2: Sum up “absolute value” of deviation –Absolute value makes negative values positive –Designated by vertical bars: What is sum? -12.5, 7.5, -32.5, 37.5 Answer: 90 –These 4 cases deviate by 90 cds from the mean Problem: Sum of Absolute Deviation grows larger if you have more cases… –Doesn’t allow comparison across samples

Turning the Deviation into a Useful Measure of Dispersion Idea #3: The Average Absolute Deviation –Calculate the sum, divide by total N of cases –Gives the deviation of the average case Formula:

Turning the Deviation into a Useful Measure of Dispersion Digression: Here we have used the mean to determine “typical” size of case deviations –Originally, I introduce the mean as a way to analyze actual case values (e.g. # of CDs owned) –Now: Instead of looking at typical case values, we want to know what sort of deviation is typical In other words a statistic, the mean, is being used to analyze another statistic – a deviation –This is a general principle that we will use often: statistics can help us understand our raw data and also further summarize our statistical calculations!

Average Absolute Deviation Example: Total Deviation = 90, N=4 –What is Average absolute deviation? –Answer: 22.5 Advantages –Very intuitive interpretation: Tells you how much cases differ from the mean, on average Disadvantages –Has non-ideal properties, according to statisticians

Turning the Deviation into a Useful Measure of Dispersion Idea #4: Square the deviation to avoid problem of negative values –Sum of “squared” deviation –Divide by “N-1” (instead of N) to get the average Result: The “variance”:

Calculating the Variance 1 CaseNum CD’s (Y)

Calculating the Variance 2 CaseNum CD’s (Y) Mean (Y bar)

Calculating the Variance 3 CaseNum CD’s (Y) Mean (Y bar) Deviation (d)

Calculating the Variance 4 CaseNum CD’s (Y) Mean (Y bar) Deviation (d) Squared Deviation (d 2 )

Calculating the Variance 5 Variance = Average of “squared deviation” –Average = mean = sum up, divide by N –In this case, use N-1 Sum of = Divide by N-1 –N-1 = 4-1 = 3 Compute variance: / 3 = = variance = s 2

The Variance Properties of the variance –Zero if all points cluster exactly on the mean –Increases the further points lie from the mean –Comparable across samples of different size Advantages –1. Provides a good measure of dispersion –2. Better mathematical characteristics than the AAD Disadvantages: –1. Not as easy to interpret as AAD –2. Values get large, due to “squaring”

Turning the Deviation into a Useful Measure of Dispersion Idea #5: Take square root of Variance to shrink it back down Result: Standard Deviation –Denoted by lower-case s –Most commonly used measure of dispersion Formula:

Calculating the Standard Deviation Simply take the square root of the variance Example: –Variance = –Square root of = 29.8 Properties: –Similar to Variance –Zero for perfectly concentrated distribution –Grows larger if cases are spread further from the mean –Comparable across different sample sizes

Example 1: s = 21.72

Example 2: s = 67.62

Example 3: s =

Thinking About Dispersion Suppose we observe that the standard deviation of wealth is greater in the U.S. than in Sweden… –What can we conclude about the two countries? Guess which group has a higher standard deviation for income: Men or Women? Why? The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why? Suppose we polled people on two political issues and the S.D. was much higher for one What are some possible interpretations? What are some other examples where the deviation would provide useful information?