Distributions When comparing two groups of people or things, we can almost never rely on a single comparison Example: Are men taller than women?

Slides:



Advertisements
Similar presentations
SPSS Review CENTRAL TENDENCY & DISPERSION
Advertisements

Measures of Dispersion or Measures of Variability
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Statistics for the Social Sciences
Calculating & Reporting Healthcare Statistics
PSY 307 – Statistics for the Behavioral Sciences
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Transforms What does the word transform mean?. Transforms What does the word transform mean? –Changing something into another thing.
SOC 3155 SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION.
Measures of Variability
Introduction to Educational Statistics
As with averages, researchers need to transform data into a form conducive to interpretation, comparisons, and statistical analysis measures of dispersion.
Chapter 5: Variability and Standard (z) Scores How do we quantify the variability of the scores in a sample?
Central Tendency and Variability
Central Tendency and Variability Chapter 4. Central Tendency >Mean: arithmetic average Add up all scores, divide by number of scores >Median: middle score.
Today: Central Tendency & Dispersion
Measures of Central Tendency
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
What is statistics? STATISTICS BOOT CAMP Study of the collection, organization, analysis, and interpretation of data Help us see what the unaided eye misses.
Part II Sigma Freud & Descriptive Statistics
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Statistics Recording the results from our studies.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Part II  igma Freud & Descriptive Statistics Chapter 3 Viva La Difference: Understanding Variability.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data.
Chapter 8 Quantitative Data Analysis. Meaningful Information Quantitative Analysis Quantitative analysis Quantitative analysis is a scientific approach.
Descriptive Statistics
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Page 1 Chapter 3 Variability. Page 2 Central tendency tells us about the similarity between scores Variability tells us about the differences between.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
Psychology’s Statistics. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Averages and Variation.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Agenda Descriptive Statistics Measures of Spread - Variability.
Measures of Spread 1. Range: the distance from the lowest to the highest score * Problem of clustering differences ** Problem of outliers.
Numerical Measures of Variability
Part II  igma Freud & Descriptive Statistics Chapter 2 Means to an End: Computing and Understanding Averages.
Chapter 3 Looking at Data: Distributions Chapter Three
Introduction to Statistics Santosh Kumar Director (iCISA)
STATISTICS. What is the difference between descriptive and inferential statistics? Descriptive Statistics: Describe data Help us organize bits of data.
Chapter 3: Averages and Variation Section 2: Measures of Dispersion.
Central Tendency & Dispersion
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Measures of variability: understanding the complexity of natural phenomena.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Descriptive Statistics. My immediate family includes my wife Barbara, my sons Adam and Devon, and myself. I am 62, Barbara is 61, and the boys are both.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Introduction to statistics I Sophia King Rm. P24 HWB
Today: Standard Deviations & Z-Scores Any questions from last time?
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Descriptive and Inferential Statistics Or How I Learned to Stop Worrying and Love My IA.
Standard Deviation. Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.
Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores.
One-Variable Statistics
Chapter 3.
BUS7010 Quant Prep Statistics in Business and Economics
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
The Mean Variance Standard Deviation and Z-Scores
Presentation transcript:

Distributions When comparing two groups of people or things, we can almost never rely on a single comparison Example: Are men taller than women?

Distributions We almost always measure several or many representative people or things

Distributions We almost always measure several or many representative people or things We also almost never measure every person or thing

Distributions We almost always measure several or many representative people or things We also almost never measure every person or thing Instead, we measure some of them

Distributions We almost always measure several or many representative people or things We also almost never measure every person or thing Instead, we measure some of them The “some of them” that you measure is called a sample because we have “sampled” the entire population

Distributions The population is every possible person or thing that could have been part of the sample (e.g. all of the men in the world, all of the women, etc.)

Distributions The population is every possible person or thing that could have been part of the sample (e.g. all of the men in the world, all of the women, etc.) We can tell a lot about a population by looking at a sample (e.g. you don’t need to eat a whole container of ice cream to know if you like it!)

Distributions When you measure several different things you get (no surprise!) different numbers

Distributions When you measure several different things you get (no surprise!) different numbers We say that those numbers are distributed

Distributions A distribution is a set of numbers. –Examples: the heights of the men in the room, the heights of the women in the room, the ages in the room, the scores on the mid-term, etc.

Distributions Looking at distributions: –We often conceptualize distributions by graphing them with a probability density function How Many? Ages

Distributions Looking at distributions: –Here’s an example of a “normal” distribution How Many? Ages

Distributions Looking at distributions: –Here’s an example of a “rectangular” distribution How Many? Birthdays

Distributions key insight: The measurements in a sample are distributed because the population is distributed

Distributions key insight: The measurements in a sample are distributed because the population is distributed Ponder this: the more people or things in your sample, the more your sample is like the entire population –It’s like “sampling” ice cream with a really big spoon

Describing Distributions It’s no good to just have a pile of numbers, we need a way of summarizing the characteristics of the distribution. What are some ways to describe a distribution?

Describing Distributions All distributions have a sum –We could just add up the samples and talk about, for example, the total height of the men and the total height of the women in the room. –What’s the problem with this approach?

Describing Distributions All distributions have a mean (a.k.a average) –The mean is the normalized sum - this means that it is adjusted for the number in the sample

Describing Distributions All distributions have a mean (a.k.a average) –The mean is the normalized sum - this means that it is adjusted for the number in the sample –How do we do that?

Describing Distributions All distributions have a mean (a.k.a average) –The mean is the normalized sum - this means that it is adjusted for the number in the sample –How do we do that? –Divide the sum by the number in the sample

“The” Mean

x is pronounced “x bar” and means “the mean” x 1 is measurement number 1 x n is the last measurement in the distribution (of n measurements) x i is any one of the measurements (you can fill in the i with any number between 1 and n)  means “add these up” -

“The” Mean “x bar” (the mean) Sum of the sample Number of measurements

Properties of the Mean Every value is some distance from the mean - this distance is called a “deviation score” deviation score = x i - x _

Properties of the Mean The mean is the point from which the sum of deviation scores is zero

Properties of the Mean The mean is the point from which the sum of deviation scores is zero This means that the mean is like a balancing point: all the scores below the mean are balanced by the scores above the mean

Properties of the Mean The sum of the squared deviations from the mean is smaller than from any other number Y is any other number

Properties of the Mean The sum of the squared deviations from the mean is smaller than from any other number

Properties of the Mean The mean is the number that, when added to itself n times, gives you the sum of the numbers in the sample =

“Other” Means Sometimes just adding the items in the sample and dividing by n gives you a number that doesn’t really describe the n numbers

“Other” Means Sometimes just adding the numbers in the sample and dividing by n gives you a number that doesn’t really describe the n numbers –for example: a sine wave +1  x i = 0 !

“Other” Means Root-Mean-Square (RMS): first square the scores before you sum them, then take the square root to undo the squaring. +1

Other Descriptions of a Distribution: the Median The mean is sensitive to outliers –eg. 1, 2, 3, 100, 4 –mean = 110/5 = 22 … not particularly representative of the numbers in the sample

Other Descriptions of a Distribution: the Median Another descriptive statistic, the median, is less sensitive to outliers –the median is the ordinal middle of the sample: half of the measurements lie below the median and half of the measurements lie above it.

Other Descriptions of a Distribution: the Median Another descriptive statistic, the median, is less sensitive to outliers –the median is the ordinal middle of the sample: half of the measurements lie below the median and half of the measurements lie above it. –in other words it is the 50th percentile

Other Descriptions of a Distribution: the Median for example: –1, 2, 3, 100, 4 put into rank order is… –1, 2, 3, 4, 100 –so the middle number (obviously) is 3 (remember that the mean was 22!)

Other Descriptions of a Distribution: the Median if n is even take the average of the two middle numbers: –1, 2, 3, 100, 4, 5 put into rank order is… –1, 2, 3, 4, –so the middle number is the average of 3 and 4 = 3.5

Other Descriptions of a Distribution: the Median the median is not sensitive to outliers –notice the median of 1, 2, 3, 4, 5 = the median of 1, 2, 3, 4, 100 = 3

Measures of Variability What’s not so good about using the mean to describe a distribution?

Measures of Variability Example: similar mean temperature in Vancouver and Lethbridge on Sept

Measures of Variability Example: BUT the distribution of temperatures is quite different for the two cities

Measures of Variability The range is the highest number minus the lowest number e.g. X = {1, 3, 23, 45, 62} the range is = 61

Measures of Variability The range is the highest number minus the lowest number Notice that the range doesn’t tell you much about the distribution of numbers. –it doesn’t tell you where the distribution is located (the mean) –it doesn’t tell you how the numbers relate to each other: e.g. 1, 48,49,50,51, 52, 100 has a range of 99!

Measures of Variability What’s needed is a measure of the “distance” between the numbers in the distribution - how spread apart are they from each other

Measures of Variability Question: How tightly or loosely spaced are the cities?

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = 0

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = 150

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = 343

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = -150

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = 0

D2D2 One approach would be to calculate the distances between each pair of cities Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current Vancouver Hope Cache Creek Kamloops Salmon Arm Revelstoke Lake Louise Banff Calgary Medicine Hat Swift Current = 193

notice that there are n * n = n 2 pairs D2D2

D2D2 If you sum up all the differences between numbers you get…

D2D2 Z E R O

D2D2 If you sum up all the differences between numbers you get…

D2D2 What does a statistician do when things sum to zero?

D2D2 Square everything first, then sum them, then square root

D2D2 D 2 is the sum of the squared differences D is the square root of D 2

D2D2 What is the problem with using D or D 2 ?

D2D2 if n is “pretty big” n 2 will be huge!

S 2 : a better choice Select a representative “anchor point” and just measure distance from that point

S 2 : a better choice Select a representative “anchor point” and just measure distance from that point For e.g. measure distances relative to Calgary

S 2 : a better choice

Notice there are some negative distances We don’t care about the sign of the distances, we just care about the distances themselves

S 2 : a better choice S 2 (called the variance) is like D 2 except it uses a single “anchor point” (like measuring distances from Calgary)

S 2 : a better choice S 2 (called the variance) is like D 2 except it uses a single “anchor point” (like measuring distances from Calgary) That anchor point is the mean

S 2 : a better choice

S: the standard deviation The standard deviation of a distribution of values is the square root of the variance

S: the standard deviation That can be rewritten this way for using a calculator:

Next Time Transforming Scores (chapter 4) We begin significance testing (chs. 11, 12, 13, 14)