Statistics Frequencies

Slides:



Advertisements
Similar presentations
Displaying Data Objectives: Students should know the typical graphical displays for the different types of variables. Students should understand how frequency.
Advertisements

Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
1.1 Displaying and Describing Categorical & Quantitative Data.
Chapter 2 Presenting Data in Tables and Charts
Organizing Numerical Data Numerical Data Ordered Array Stem and Leaf Display Frequency Distributions Cumulative Distributions Histograms.
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Descriptive statistics (Part I)
Sexual Activity and the Lifespan of Male Fruitflies
QM 1 - Intro to Quant Methods Graphical Descriptive Statistics Charts and Tables Dr. J. Affisco.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Data Presentation.
Lecture 2 Graphs, Charts, and Tables Describing Your Data
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Chapter 2 Organizing Data Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 2 Data Presentation Using Descriptive Graphs.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Topics for our first Seminar The readings are Chapters 1 and 2 of your textbook. Chapter 1 contains a lot of terminology with which you should be familiar.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 2-1 Week 2 Presenting Data in Tables and Charts Statistical Methods.
Chap 2-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course in Business Statistics 4 th Edition Chapter 2 Graphs, Charts, and Tables.
Chapter 2 Summarizing and Graphing Data  Frequency Distributions  Histograms  Statistical Graphics such as stemplots, dotplots, boxplots, etc.  Boxplots.
Welcome to Week 02 College Statistics
CHAPTER 1 Exploring Data
2.2 More Graphs and Displays
Chapter 1.1 Displaying Distributions with graphs.
Organizing Data.
BUSINESS MATHEMATICS & STATISTICS.
Chapter 2: Methods for Describing Data Sets
Welcome to Week 02 Tues MAT135 Statistics
Probability & Statistics Displays of Quantitative Data
Chapter 4 Review December 19, 2011.
Graphics GrowingKnowing.com © 2013.
Unit 4 Statistical Analysis Data Representations
Descriptive Statistics
Elementary Applied Statistics
Descriptive Statistics
Statistical Reasoning
Welcome to Wk09 MATH225 Applications of Discrete Mathematics and Statistics
CONSTRUCTION OF A FREQUENCY DISTRIBUTION
Displaying Distributions with Graphs
Chapter 2 Presenting Data in Tables and Charts
CHAPTER 1: Picturing Distributions with Graphs
Frequency Distributions and Graphs
CHAPTER 1 Exploring Data
Frequency Distributions
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Sexual Activity and the Lifespan of Male Fruitflies
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Understanding Basic Statistics
Experimental Design Experiments Observational Studies
CHAPTER 1 Exploring Data
Section 1.1: Displaying Distributions
CHAPTER 1 Exploring Data
PENGOLAHAN DAN PENYAJIAN Presenting Data
CHAPTER 1 Exploring Data
Presentation transcript:

Statistics Frequencies https://www.123rf.com/photo_6622261_statistics-and-analysis-of-data-as-background.html

Histograms Many bar/column charts display count data The counts shown in each category are called “frequencies”

Histograms There are a lot of graphs that specialize in showing frequencies These are called “histograms” There are several popular types of histograms

Histograms Dot plot (automatic graph)

Histograms What month is your birthday?

Histograms Fancy dot plot using pictographs:

Histograms

Histograms Living Histogram

Histograms Stem and leaf plot Also changes the data into a bar graph For measurement data Let you see the original data values

Histograms the stem is usually the leftmost digit/s the leaf is the rightmost digit (the "ones")

Histograms It forms a sort of dot plot… But the data are still there!

Histograms More stem and leaf:

Histograms Comparative stem and leaf

FREQUENCIES IN-CLASS PROBLEM You are doing research on traffic offenses in Denver. Your first research objective is to find out if Franklin Street’s speed limit should be greater than 25 mph. You start by sampling 30 speeding tickets from this street and record the speed.

FREQUENCIES IN-CLASS PROBLEM What is the population? What is the variable? Is the variable Qualitative or Quantitative?

Create a Stem and Leaf for the raw data: FREQUENCIES IN-CLASS PROBLEM Create a Stem and Leaf for the raw data: 48 92 50 29 40 129 43 108 39 42 57 104 83 45 81 123 38 67 32 65 46 80 100 98

Questions?

Frequencies Types of frequencies: Absolute frequency – the number of observations that fall in a certain category

Frequencies A table of absolute frequencies is called a frequency distribution

Frequencies Data table: A B A B A C B B Frequency Histogram: distribution: A: 3 B: 4 C: 1

Questions?

Measurement Frequencies So… what if your data are measurements rather than counts?

Measurement Frequencies Often we change the measurements into counts These derived counts are also “frequencies”

Measurement Frequencies We can change measured data to categories by splitting the continuum into named categories

Measurement Frequencies Sale price (in thousand $) 8.0 – 11.0 11.1 - 14.1 14.2 – 17.2 17.3 – 20.3 20.4 – 23.4 23.5 – 26.5 Minutes Internet Usage 1-10 11-20 21-30 31-40 41-50 51-60 60+ Years of experience 1 - 2 3 - 4 5 - 6 7 - 8 9 - 10 11+

Measurement Frequencies The counts of observations falling in these user-manufactured categories are still called “frequencies”

Measurement Frequencies A bar graph of frequencies in user-manufactured categories is still called a “histogram”

Measurement Frequencies It is less confusing to viewers to keep the numerical categories the same width

Measurement Frequencies Numerical categories should not overlap Numerical categories should not leave any blank spaces in the continuum

Measurement Frequencies Numerical categories are also called “classes”

Measurement Frequencies For numerical categories, the maximum and minimum values in each category are called the “class limits”

Measurement Frequencies For numerical categories, the range of values included in each category is called the “width”

Measurement Frequencies The middle of each numerical category is called the “midpoint” Add the maximum and minimum (class limits) and divide by 2

Measurement Frequencies Rounding may move observed values into different numerical categories The actual maximum and minimum values that end up in a given numerical category are called the “class boundaries”

Measurement Frequencies We want at least 5 categories This allows us to pretend the data is still “continuous” (one of those statistical things)

Measurement Frequencies For psychological reasons, we usually limit the number of categories to a maximum of 8

Measurement Frequencies For psychological reasons, we usually limit the number of categories to a maximum of 8 Typically the human brain can compare only 7-8 things before becoming overloaded

Measurement Frequencies So you should aim for 5-8 classes with “kinda-nice” class limit values

Which would be better? FREQUENCIES IN-CLASS PROBLEM Minutes Internet Usage Number of Users 1-15 16-30 31-45 46-60 61-75 76-90 91-105 106-120 121+ Minutes Internet Usage Number of Users 1-20 21-40 41-60 61-80 81-100 101-120 121+

Create a frequency distribution: FREQUENCIES IN-CLASS PROBLEM Create a frequency distribution: 48 92 50 29 40 129 43 108 39 42 57 104 83 45 81 123 38 67 32 65 46 80 100 98

Questions?

Measurement Frequencies Open the data set on “InClass-Internet” Your assignment: create a chart for this data What could you do?

Measurement Frequencies Bar chart? Yuck! Try an x-y plot!

Measurement Frequencies Still yuck! Now what???

Measurement Frequencies The numbers are not in any particular order – the dots don’t tell a story about the data (other than that it’s messy and disorganized…)

Measurement Frequencies A better graph would group the numbers into a meaningful pattern that will answer an interesting question we might have about the data

Measurement Frequencies Let’s sort the data! In Excel, first highlight just the numbers Then click on the “Data” tab

Measurement Frequencies Click on “Sort” Use the “A->Z” sort to go from lo to hi

Measurement Frequencies Poof! Little numbers on top, big numbers on the bottom Try a bar graph now…

Measurement Frequencies Not as ugly… but still no story!

Measurement Frequencies You need to consider what question you are trying to answer with the data

Measurement Frequencies What you might want to show is: “Do people spend a lot of time on the Internet? How much?”

Measurement Frequencies To show this, it would make sense to create categories from this quantitative data! (Believe it or not…)

Measurement Frequencies How many minutes is “not many”? 5? 10? 15? Let’s say “10 or fewer” That becomes our first category: “1-10”

Measurement Frequencies Type in: ‘1-10 or Excel will change it into a date

Measurement Frequencies Start a “Summary Table” of categories and how many observations fall into each one: Minutes Internet Usage Number of Users 1-10 3

Measurement Frequencies What would be the next category? It could be anything you decide… but…

Measurement Frequencies It is less confusing to viewers to keep the categories the same width

Measurement Frequencies In general: Categories should usually be the same width Categories should not overlap Categories should not leave any blank spaces in the continuum

Measurement Frequencies Your previous category was “1-10 minutes” Its width is: 10-1 +1 = 10

Measurement Frequencies SO, the next category should be right next to “1-10”, not overlap with “1-10” and be 10 wide: “ ”

Measurement Frequencies Continuing, our categories will be: Minutes Internet Usage Number of Users 1-10 3 11-20 21-30 31-40 41-50 51-60 etc...

Measurement Frequencies But… that’s still A LOT of categories! Our data goes up to 123 minutes!! The graph will be better than the original, but still cluttered!

Measurement Frequencies Remember… the human brain can compare only 7-8 things before becoming overloaded

Measurement Frequencies We need to redo our categories to have only about 7 or 8 of them

Measurement Frequencies Our data goes from a minimum of 5 to a maximum of 123 If we had only one category, it would have a width: 123-5 +1 = 119

Measurement Frequencies If we split the 119 into 7 equal pieces: 119/7 = 17 17 is not a very “nice” number for category splits 15 or 20 would be “evener”

Measurement Frequencies Which would be better? Minutes Internet Usage Number of Users 1-15 16-30 31-45 46-60 61-75 76-90 91-105 106-120 121+ Minutes Internet Usage Number of Users 1-20 21-40 41-60 61-80 81-100 101-120 121+

Measurement Frequencies Now we have to get the number of users for each of these categories: Minutes Internet Usage Number of Users 1-20 21-40 41-60 61-80 81-100 101-120 121+

Measurement Frequencies How many observations fall in this first category? Highlight the observations that are 20 or less

Measurement Frequencies The number of observations is

Measurement Frequencies The number of observations in that category is the frequency

Measurement Frequencies Let’s graph these new categorical data: Minutes Internet Usage Number of Users 1-20 9 21-40 18 41-60 15 61-80 8 81-100 1 101-120 121+

Measurement Frequencies Much better! The graph now tells a story

Measurement Frequencies You can also now see that the value “123” is an “outlier”

Measurement Frequencies Outliers have one or more empty (zero count) categories between their category and the others

Measurement Frequencies Outliers can be a problem in statistical analysis You have to decide whether the value is truly an outlier and should be eliminated or a valid extension of the data

Measurement Frequencies Another option:

Questions?

Frequencies A relative frequency is the fraction or percent of observations that fall in each category

Frequencies You first find the total sample size (n) by adding up all of the counts in each category

Frequencies Then divide each category count by n

Frequencies You can make these percentages by multiplying by 100 (or just clicking the % sign on the Excel ribbon)

Frequencies Data table: n = 8 A B A B A C B B Rel Freq Histogram: distribution: A: 3/8 B: 4/8 C: 1/8

Frequencies Notice the shapes of the absolute frequency and relative frequency graphs are the same

Frequencies Because we see % more easily in a pie chart, relative frequencies should be shown in this format

Create a relative frequency distribution: FREQUENCIES IN-CLASS PROBLEM Create a relative frequency distribution: 48 92 50 29 40 129 43 108 39 42 57 104 83 45 81 123 38 67 32 65 46 80 100 98

Questions?

Measurement Frequencies Numerical categories are also called “classes”

Measurement Frequencies For numerical categories, the maximum and minimum values in each category are called the “class limits”

Measurement Frequencies What are the class limits for the Franklin St data?

Measurement Frequencies For numerical categories, the range of values included in each category is called the “width”

What is the class width for the Franklin St data? FREQUENCIES IN-CLASS PROBLEM What is the class width for the Franklin St data?

Measurement Frequencies The middle of each numerical category is called the “midpoint” Add the maximum and minimum (class limits) and divide by 2

What is the midpoint for the first class in the Franklin St data? FREQUENCIES IN-CLASS PROBLEM What is the midpoint for the first class in the Franklin St data?

Measurement Frequencies Rounding may move observed values into different numerical categories The actual maximum and minimum values that end up in a given numerical category are called the “class boundaries”

FREQUENCIES IN-CLASS PROBLEM What are the class boundaries for second category in the Franklin St data?

Questions?

What graph? Which are frequency distributions? FREQUENCIES IN-CLASS PROBLEM What graph? Which are frequency distributions?

Questions?