Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms.

Slides:



Advertisements
Similar presentations
Chapter 2: Frequency Distributions
Advertisements

Describing Quantitative Variables
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
1 Practical Psychology 1 Week 5 Relative frequency, introduction to probability.
Random Sampling and Data Description
Statistical Issues in Research Planning and Evaluation
DENSITY CURVES and NORMAL DISTRIBUTIONS. The histogram displays the Grade equivalent vocabulary scores for 7 th graders on the Iowa Test of Basic Skills.
Chapter 3 The Normal Curve.
The standard error of the sample mean and confidence intervals
Calculating & Reporting Healthcare Statistics
The standard error of the sample mean and confidence intervals
Chapter 3 The Normal Curve Where have we been? To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum.
Chapter 1 The mean, the number of observations, the variance and the standard deviation.
Chapter 3 The Normal Curve Where have we been? To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum.
Chapter 4 Translating to and from Z scores, the standard error of the mean and confidence intervals Welcome Back! NEXT.
Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms.
Chapter 3 The Normal Curve Where have we been? To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Chapter 1 The mean, the number of observations, the variance and the standard deviation.
Chapter 2 online slides Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms.
Chapter 2 Frequency Distributions, Stem-and-leaf displays, and Histograms.
Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Today: Central Tendency & Dispersion
Chapter 2 Describing Data with Numerical Measurements
Chapter 1: Introduction to Statistics
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
COURSE: JUST 3900 TIPS FOR APLIA Developed By: Ethan Cooper (Lead Tutor) John Lohman Michael Mattocks Aubrey Urwick Chapter 2: Frequency Distributions.
Data Presentation.
1.1 Displaying Distributions with Graphs
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Methods for Describing Sets of Data
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Describing distributions with numbers
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Anthony J Greene1 Where We Left Off What is the probability of randomly selecting a sample of three individuals, all of whom have an I.Q. of 135 or more?
Descriptive Statistics: Presenting and Describing Data.
Unit 4 Statistical Analysis Data Representations.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Central Tendency & Dispersion
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
1 Frequency Distributions. 2 After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get.
Introduction to statistics I Sophia King Rm. P24 HWB
2/15/2016ENGM 720: Statistical Process Control1 ENGM Lecture 03 Describing & Using Distributions, SPC Process.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
AP Statistics From Randomness to Probability Chapter 14.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
13-5 The Normal Distribution
Different Types of Data
Chapter 2: Methods for Describing Data Sets
Descriptive Statistics: Presenting and Describing Data
An Introduction to Statistics
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Presentation transcript:

Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Where have we been?

To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum them (SS), divide by N (  2 ) and take a square root(  ). Example: Scores on a Psychology quiz Student John Jennifer Arthur Patrick Marie X78357X78357  X = 30 N = 5  = 6.00 X -   (X-  ) = 0.00 (X -  )  (X-  ) 2 = SS =  2 = SS/N = 3.20  = = 1.79

Ways of showing how scores are distributed around the mean Frequency Distributions, Stem-and-leaf displays Histograms

Some definitions Frequency Distribution - a tabular display of the way scores are distributed across all the possible values of a variable Absolute Frequency Distribution - displays the count (how many there are) of each score. Cumulative Frequency Distribution - displays the total number of scores at and below each score. Relative Frequency Distribution - displays the proportion of each score. Relative Cumulative Frequency Distribution - displays the proportion of scores at and below each score.

Example Data Traffic accidents by bus drivers Studied 708 bus drivers, all of whom had worked for the company for the past 5 years or more. Recorded all accidents for the last 4 years. Data looks like: 3, 0, 6, 0, 0, 2, 1, 4, 1, … 6, 0, 2

Frequency distributions – Absolute & cumulative frequency # of acdnts Absolute Frequency Cumulative Frequency To calculate absolute frequencies, tally and count the number of each kind of score. To calculate cumulative frequencies, add up the absolute frequencies of scores at or below each score (or possible score, if a score is missing).

Frequency distributions- relative frequencies # of acdnts Absolute Frequency Cumulative Frequency Relative Frequency Calculate relative frequencies by dividing each absolute frequency by N, the total number of scores. (For example 117/708 =.165.) Relative frequencies show the proportion of scores at each point. Note rounding error

What pops out of such a display Number of accidents = 0(117) + 1(157) + 2(158) + 3(115) + 4(78) + 5(44) +6(21)+7(7)+8(6)+9(1)+10(3)+11(1)= drivers (about 2.5% of the drivers) had 7 or more accidents during the 4 years just before the study. Those 18 drivers caused 147 of the 1623 accidents or very close to 9% of the accidents Maybe they should be given eye/reflex exams?

What pops out of such a display 5 drivers (about.7% of the drivers) had 9 or more accidents during the 4 years just before the study. Those 5 drivers caused 50 of the 1623 accidents or a little over 3% of the accidents They should be given eye/reflex exams! Probably, they should be given desk jobs.

Frequency distributions- cumulative relative frequencies # of acdnts Absolute Frequency Cumulative Frequency Cumulative Relative Frequency Calculate cumulative relative frequencies, by dividing the number of scores at or below each possible score by N, the total number of scores. For example: cumulative relative frequency of a score of 3 is 547/708 =.773. Cumulative relative frequencies show the proportion of scores at or below each score.

Grouped Frequencies Needed when –number of values is large OR –values are continuous. To calculate group intervals –First find the range. –Determine a “good” interval based on on number of resulting intervals, meaning of data, and common, regular numbers. –List intervals from largest to smallest.

Grouped Frequency Example 100 High school students’ average time in seconds to read ambiguous sentences. Values range between 2.50 seconds and 2.99 seconds.

Determining “i” (the size of the interval) WHAT IS THE RULE FOR DETERMINING THE SIZE OF INTERVALS TO USE IN WHICH TO GROUP DATA? Whatever intervals seems appropriate to most informatively present the data. It is a matter of judgment. Usually we use 6 – 12 same size intervals each of which uses an intuitively obvious endpoint such as 0 or 5.

Grouped Frequencies Reading Time Reading Time Frequency Frequency Range = =.50 (see real/apparent class limits--discussed infra) i =.1 #i = 5 i =.05 #i = 10

Either is acceptable. Use whichever display seems most informative. In this case, the smaller intervals and 10 category table seems more informative. Sometimes it goes the other way and less detailed presentation is necessary to prevent the reader from missing the forest for the trees.

How you organize the data is up to you. When engaged in this kind of thing, there is often more that one way to organize the data. You should organize the data so that people can easily understand what is going on. Thus, the point is to use the grouped frequency distribution to provide a simplified description of the data.

Stem and Leaf Displays Used when seeing all of the values is important. Shows –data grouped –all values –visual summary

Stem and Leaf Display Reading time data Reading Time Leaves 5,5,6,6,6,6,8,8,9 0,0,1,2,3,3,3 5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9 0,0,1,2,3,3,3,3,4,4,4 5,5,5,5,6,6,6,8,9,9 0,0,0,1,2,3,3,3,4,4 5,6,6,6 0,1,1,1,2,3,3,4 6,6,8,8,8,8,8,9,9,9 0,1,1,1,2,2,2,4,4,4,4 i =.05 #i = 10

Stem and Leaf Display Reading time data Reading Time Leaves 0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9 0,0,1,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9 0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9 0,1,1,1,2,3,3,4,5,6,6,6 0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,8,9,9,9 i =.1 #i = 5

Purely figural displays of frequency data

Bar graphs Bar graphs are used to show frequency of scores when you have a discrete variable. Discrete data can only take on a limited number of values. Numbers between adjoining values of a discrete variable are impossible or meaningless. Bar graphs show the frequency of specific scores or ranges of scores of a discrete variable. The proportion of the total area of the figure taken by a specific bar equals the proportion of that kind of score. Note, in this context proportion and relative frequency are synonymous.

The results of rolling a six-sided die 120 times 120 rolls – and it came out 20 ones, 20 twos, etc

Bar graphs and Histograms Use bar graphs, not histograms, for discrete data. (The bars don’t touch in a bar graph, they do in a histogram.) You rarely see data that is really discrete. Discrete data are almost always categories or rankings.ANYTHING ELSE IS ALMOST CERTAINLY A CONTINUOUS VARIABLE. Use histograms for continuous variables. AGAIN, almost every score you will obtain reflects the measurement of a continuous variable.

A stem and leaf display turned on its side shows the transition to purely figural displays of a continuous variable

Histogram of reading times – notice how the bars touch at the real limits of each class! Reading Time (seconds) FrequencyFrequency

Histogram concepts - 1 Histograms must be used to display continuous data. Most scores obtained by psychologists are continuous, even if the scores are integers. WHAT COUNTS IS WHAT YOU ARE MEASURING, NOT THE PRECISION OF MEASUREMENT. INTEGER SCORES IN PSYCHOLOGY ARE USUALLY ROUGH MEASUREMENTS OF CONTINUOUS VARIABLES.

Example and question You give a Psych Quiz with ten questions. Scores can be 0,1,2,3,4,5,6,7,8,9, or 10. Are the resulting scores discrete or continuous data?

Answer to example While scores on a ten question multiple choice intro psych quiz ( 1, 2, …10) are integers, you are measuring knowledge, which is a continuous variable that could be measured with 10,000 questions, each counting.001 points. Or 1,000,000 questions each worth points. You measure at a specific level of precision, because that’s all you need or can afford. Logistics, not the nature of the variable, constrains the measurement of a continuous variable.

Histogram concepts - 2 If you have continuous data, you can use histograms, but remember real class limits. Histograms can be used for relative frequencies as well. Histograms can be used to describe theoretical distributions as well as actual distributions.

Theoretical Histograms

Displaying theoretical distributions is the most important function of histograms. Theoretical distributions show how scores can be expected to be distributed around the mean.

TYPES OF THEORETICAL DISTRIBUTIONS Distributions are named after the shapes of their histograms. For psychologists, the most important are: –Rectangular –J-shaped –Bell (Normal) –t distributions - Close to Bell shaped, but a little flatter

Rectangular Distribution of scores

The rectangular distribution is the “know nothing” distribution Our best prediction is that everyone will score at the mean. But in a rectangular distribution, scores far from the mean occur as often as do scores close to the mean. So the mean tells us nothing about where the next score will fall (or how the next person will behave). We know nothing in that case.

Flipping a coin: Rectangular distributions are frequently seen in games of chance, but rarely elsewhere. 100 flips - how many heads and tails do you expect? Heads Tails

Rolling a die 120 rolls - how many of each number do you expect?

Which distribution is this?

RECTANGULAR!

What happens when you sample two scores at a time? All of a sudden things change. The distribution of scores begins to resemble a normal curve!!!! The normal curve is the “we know something” distribution, because most scores are close to the mean.

Rolling 2 dice Look at the histogram to see how this resembles a bell shaped curve. Dice Total Absolute Freq Relative Frequency

Rolling 2 dice 360 rolls

Normal Curve

J Curve Occurs when socially normative behaviors are measured. Most people follow the norm, but there are always a few outliers.

What does the J shaped distribution represent? The J shaped distribution represents situations in which most everyone does about the same thing. These are unusual social situations with very clear contingencies. For example, how long do cars without handicapped plates park in a handicapped spot when there is a cop standing next to the spot. Answer: Zero minutes! So, the J shaped distribution is the “we know almost everything” distribution, because we can predict how a large majority of people will behave.

Principles of Theoretical Curves zExpected frequency = Theoretical relative frequency X N zExpected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the difference between observed and predicted frequencies. zLaw of Large Numbers - The more observations that we have, the closer the relative frequencies we actually observe should come to the theoretical relative frequency distribution.