Exploring Numerical Data

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

Chapter 4 Exploring Numerical Data
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Histogram Most common graph of the distribution of one quantitative variable.
Introductory Statistics: Exploring the World through Data, 1e
1.2 Displaying Quantitative Data w/ Graphs Pages Objectives SWBAT: 1)Make and interpret dotplots and stemplots of quantitative data. 2)Describe the.
Chapter 3: Displaying and Summarizing Quantitative Data Part 1 Pg
1.2 Displaying Quantitative Data with Graphs.  Each data value is shown as a dot above its location on the number line 1.Draw a horizontal axis (a number.
1.2 Dotplots, Stem & Leaf, Histograms
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Graphing options for Quantitative Data
Chapter 4: The Normal Distribution
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Probability & Statistics Displays of Quantitative Data
Warm Up.
Unit 4 Statistical Analysis Data Representations
Choosing the “Best Average”
Displaying Quantitative Data
3.4 Histograms.
CHAPTER 2 Modeling Distributions of Data
Displaying Distributions with Graphs
Sec. 1.1 HW Review Pg. 19 Titanic Data Exploration (Excel File)
recap Individuals Variables (two types) Distribution
Honors Statistics Chapter 4 Part 4
Chapter 1 Data Analysis Section 1.2
Give 2 examples of this type of variable.
The Practice of Statistics
Warmup Draw a stemplot Describe the distribution (SOCS)
1.2: Displaying Quantitative Data with Graphs
CHAPTER 2 Modeling Distributions of Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Identifying key characteristics of a set of data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
The Shape of Distributions
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 1 Exploring Data
Displaying Distributions with Graphs
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Presentation transcript:

Exploring Numerical Data Objectives Students will be able to: graph the distribution of a numerical variable 2) calculate summary statistics for a distribution of a numerical variable 3) compare distributions of a numerical variable

NL vs AL In MLB, what is the lineup difference between the NL and the AL? In 1973, the AL enacted the designated hitter (DH). The DH is a player that only bats (does not play defense). In the AL, the DH bats in the place of the pitcher. The DH was designed to increase offense, which would in turn generate more interest in AL games. The assumption was that fans would like to see more offense. Does the DH increase offense in MLB?

… numerical variables are variables whose possible outcomes take on numerical values that represent different quantities of the variable. Examples: number of runs scored by teams in the AL number of sacks in an NFL season by DeMarcus Ware the amount of time it takes to swim 100 meters It is beneficial to begin an analysis of numerical variables with a graph of the data.

Here are the run totals for the 30 MLB teams in 2008 Here are the run totals for the 30 MLB teams in 2008. Note: the Astros were still in the NL.

There are various ways to graph the distribution of numerical variables. We already know how to make a dotplot. Note: when making dotplots that compare two distributions, it is important to ensure the dotplots are on the same scale. Otherwise, the distributions are difficult to compare.

Here are the dotplots comparing the distribution of runs scored for AL and NL teams in 2008 (pg 120). At first glance, the distribution of runs scored seems fairly similar for both leagues. Perhaps AL teams score a little more often than NL teams.

Histograms A histogram is a graph that divides the values of a numerical variable into classes and uses bars to represent the number of values in each class. The frequency describes the number of observations in each class. For our histogram, the number of runs will be broken down into classes, and the frequency will be the number of teams in those classes.

One easy way to make a histogram is by starting with a dotplot, and building from there. Let’s make a histogram for the number of runs scored during 2008 for all 30 MLB teams. Step 1: Start with a dotplot showing the runs scored for each of the 30 MLB teams.

Step 2: Divide the data into 5 to 10 equally wide classes Step 2: Divide the data into 5 to 10 equally wide classes. For this example, we can use classes that are 50 runs wide. Therefore, our first class will be 600-650 runs, the next class is 650-700 runs, etc…

Step 3: Count how many observations are in each class Step 3: Count how many observations are in each class. If an observation falls exactly on a border line, it is considered part of the class above the boundary. For example, the observation on 750 would count as part of the 750-800 class.

Step 4: Draw bars for each class Step 4: Draw bars for each class. Bars should be equally wide and have no spaces between them. The height for each bar corresponds with the number of observations in that class.

It is also possible to make a relative frequency histogram It is also possible to make a relative frequency histogram. This shows the percentage of observations in each class, rather than the number of observations.

When comparing two or more histograms: Use the same scales! The scales on the horizontal axes should match. The scales on the vertical axes should match. When the number of observations is not the same between distributions, we should make a relative frequency histogram. Let’s look at why….

Here are two frequency histograms comparing the number of points scored for players on the LA Lakers and players not on the Lakers in the 2008-2009 regular season. Because there are many more players not on the Lakers, it is hard to compare these distributions.

Let’s now use a relative frequency histogram: The comparison is now much easier to make.

Describing the Shape of the Distribution There are several phrases we can use to describe the shape of the distribution of numerical data. Let’s look at this using different data from all 2009 MLB players who had at least 300 plate appearances. A plate appearance occurs each time a player takes their turn at bat.

Symmetric A distribution is symmetric if the left side of the graph is roughly a mirror image of the right side.

Skewed right A distribution is skewed to the right when the right side of the graph is more spread out than the left side.

Skewed left A distribution is skewed to the left when the left side of the graph is more spread out than the right side.

Unimodal A distribution is unimodal when it shows one distinct peak Unimodal A distribution is unimodal when it shows one distinct peak. Note: the previous three graphs can also be considered unimodal.

Bimodal A distribution is bimodal if it has two distinct peaks Bimodal A distribution is bimodal if it has two distinct peaks. This graph has a peak at 0 and a peak at 0.8.

Caution: Unimodal vs Bimodal A common error is calling a distribution bimodal when it is really unimodal. To call a distribution bimodal, the peaks need to be clearly distinct. Sometimes a peak occurs because of our choice in boundaries. A good rule of thumb is that if moving one or two observations would eliminate a peak, then there is a good chance that the peak is only there because of our choice in boundaries.

Caution: Unimodal vs Bimodal Here are two histograms that use the exact same data, but different class widths. The first looks like it has two peaks, but the second seems clearly unimodal.

Uniform A distribution is uniform when the heights of the bars are all about the same.

Dotplot vs Histogram General rule of thumb: When the data sets are small, a dotplot is more useful (allows us to see each individual observation). When the data sets are large, a histogram is more useful. Think about trying to make a dotplot of the heights of all Americans. There would be way too many dots.