Download presentation
1
Chapter 2 Picturing Variation with Graphs
2
Chapter 2 Topics Visualizing variation in numerical and categorical data Summarizing important features in numerical and categorical distributions
3
Visualizing variation in numerical data
Section 2.1 wavebreakmedia. Shutterstock Visualizing variation in numerical data Exploring a Distribution of Numerical Data Dotplots, Frequency / Relative Frequency Histograms, and Stemplots
4
What Is a Distribution? Recall: Any collection of data will have variation within the data. The most important tool for organizing the variation in data is called the distribution of the sample. Three Main Components of a Numerical Distribution: Shape (What does it look like visually?) Center (typical value) Variability (horizontal spread)
5
Distribution: Example
Here are some raw data from the National Collegiate Athletic Association (NCAA), available online. This set of data shows the number of goals scored by first-year NCAA female soccer players in Division III in the 2012 season. 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 14, 14, 14, 15, 15, 16, 16, 16, 16, 18, 18, 19, 19, 20, 20, 21, 35 Note that: Distributions record all observed data values. Seeing patterns in the raw data may be difficult.
6
Frequency Tables The distribution of data can be organized in a frequency table: (This is the data from the previous example.) Keep in mind: A frequency table lists all data values with their counts. Patterns may still be difficult to see.
7
Examining a Distribution
When examining distributions, use a two-step process: Visualize the data. Use a graph that effectively summarizes the data visually. Using a picture to display the data will help us see patterns. Graphs for numerical data include: Dotplots, histograms, and stemplots
8
Examining a Distribution
Summarize the data. Shape: Is there symmetry? Center: Is there a most common value? Spread: Are any data values far from the rest of the data?
9
Visualizing Data: Goal
Keep in mind: Using a picture to display the data will help us see patterns. Different visual representations capture different aspects in the data. The picture must: Record the data values. Indicate the frequency (count) of the data values.
10
Visualizing Data: Dotplots
Dotplot Record data values on a number line with a dot above the number line for each data value observed. Example: Here is the dotplot for our example data.
11
Dotplot: Example How many textbooks cost $150 or more?
What percent of the textbooks cost $50 or less? Are there any unusually expensive or inexpensive texts?
12
Dotplot: Example 4 textbooks cost $150 or more
8/23 x 100% = 34.8% of the textbooks cost $50 or less The text that costs close to $300 may be unusually expensive.
13
Dotplots: Advantages and Disadvantages
Shows individual data values Helps investigate the shape of the distribution Disadvantages Not as common as histograms and other graphs Not great for data with too many individual values
14
Visualizing Data: Histogram
Group data into intervals, called bins (width of the interval = bin width). Count how many data values fall into each bin. Each rectangle has the following properties: Consecutive bins touch First value in each bin is recorded on the horizontal axis The height of each rectangle corresponds to the count
15
Histogram: Example Here is the histogram for our example data.
Note: Vertical axis can show frequency or relative frequency
16
Histogram: Changing Bin Widths
Changing the bin width changes the shape.
17
Notes about Bin Width A width that is:
Too narrow shows too much detail. Too wide hides detail. Most technology (StatCrunch and TI-84) chooses bin widths for an initial look at the data, but one should always experiment with adjusting bin width to see if anything interesting appears.
18
Histogram: Advantages and Disadvantages
Good for large data sets Helps focus on the general shape of the data Easy to spot outliers Disadvantages Individual data values are not visible (lost) Distribution shape affected by change in bin width
19
Using the TI-84 Calculator
To create a histogram on the TI -84 calculator: Push STAT then select option 1: Edit. Enter the data set in L1. Push 2nd Y= (for Stat Plot). Turn on Plot1 (press ENTER twice). Use the down arrow , followed by the right arrow, to select Type that looks like histogram and push ENTER. Make sure Xlist is set to L1. Push GRAPH > ZOOM followed by the number 9 (for option 9:Zoom Stat) to see the histogram. Use TRACE and the arrow keys to navigate about the graph (you will see relevant information on the screen).
20
Visualizing Data: Stemplot
Stemplots Also called stem-and-leaf plots Like dotplots, show all individual data values Useful when technology is not available or when the data set is not too large
21
Stemplot: Example The stemplot for this data is:
A collection of college students who said that they drink alcohol were asked how many alcoholic drinks they had consumed in the last seven days. Their answers were: 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 6, 6, 6, 8, 10, 10, 15, 17, 20, 25, 30, 30, 40 The stemplot for this data is:
22
Summarizing important features of a numerical distribution
Section 2.2 Aleksandr Bryliaev. Shutterstock Summarizing important features of a numerical distribution Shape, Center, and Spread Outliers
23
Shape Three Basic Characteristics of Shape:
Is the distribution symmetric or skewed? How many mounds appear? Are unusually large or small values present?
24
Shape: Symmetric Symmetric Left and right side roughly the same
25
Shape: Skewed Skewed Most of the data is on one side with a long tail (or skew) on the other side
26
Shape: Mounds Classify data by how many mounds are present:
Unimodal One main mound
27
Shape: Mounds Classify data by how many mounds are present: Bimodal
Two main mounds Multimodal More than two main mounds
28
Mounds: Keep In Mind Mounds can be different heights.
Bimodal and multimodal data may indicate existence of different groups within the data. In this case, it may be preferable to separate the data into two groups and provide separate graphs for each group. Examples: Men and women’s heights Afternoon and evening sales at a restaurant
29
Shape: Examples What shape would you expect to see in a histogram of the following data sets? GPA of college students SAT scores Last digit of Social Security numbers for a random sample of students Income of USA residents
30
Shape: Example What shape would you expect to see in a histogram of the following data sets? GPA of college students Skewed left SAT scores Symmetric (Unimodal) Last digit of Social Security numbers for a random sample of students Symmetric (Uniform) Income of USA residents Skewed right
31
Shape: Extreme Values Outliers Extremely large or small values
Data values that don’t fit the pattern of the rest of the data Not precisely defined (subject to opinion)
32
Shape: Extreme Values When you see extremely large or small values:
Report the values. Realize they could be sources of error (typos, etc.). Genuine outliers are unusually interesting data values!
33
Center Center The typical data value
The typical scores: Women: 16 goals Men: 13 goals It would seem that the typical male soccer player scores fewer goals in a season than the typical female player. Example: The histograms for Division III first-year women and men soccer players in 2012 are given below. Women Men
34
Look at the horizontal spread in the histogram or dotplot:
Variability Look at the horizontal spread in the histogram or dotplot: If all data values are similar: Narrow graph If data values are different: Wider graph
35
Describing Numerical Distributions: Summary
Always remember to describe a numerical distribution using these three components: Shape (symmetric, skewed left/right, modes) Center (typical value) Variability (horizontal spread)
36
Visualizing variation in categorical VARIABLES
Section 2.3 Goodluz. Shutterstock Visualizing variation in categorical VARIABLES Bar Charts Pie Graph
37
Visualizing Data: Bar Chart
Note: We treat categorical variables similar to numerical variables. Bar Chart similar to a histogram, record data categories along the horizontal axis with the height corresponding to the frequency of the data
38
Visualizing Data: Bar Chart
Example: Here is the bar chart for the class standing of students interested in a UCLA statistics class.
39
Note: Bar charts and histograms are different!
Bar Chart vs. Histogram Note: Bar charts and histograms are different! Key differences: Histogram Bar Chart Bars: May touch DO NOT touch Bar Width: Corresponds to bin width Can be any desired width (all the same) Horizontal labels: Numerical Order No inherent order NOTE: A Pareto chart is a bar graph in which the bars are arranged from tallest to shortest. (This cannot necessarily be done in a histogram!)
40
Visualizing Data: Pie Chart
Pie Chart A circle divided into pieces (the area of each piece is proportional to the relative frequency, or percent, of the data in that piece)
41
Visualizing Data: Pie Chart
Example: Here is the pie chart for the class standing of students interested in a UCLA statistics class.
42
Summarizing categorical distributions
Section 2.4 Lisa S.. Shutterstock Summarizing categorical distributions Mode and Variability in Categorical Distributions Describing Categorical Distributions
43
Describing a Categorical Distribution
Recall: To describe a numerical distribution, we record shape, center, and spread. Since categorical data has no inherent order, these measures do not make sense for categorical data. Two Main Components of a Categorical Distribution: Mode (typical, or most frequent, outcome) Variability (or diversity in outcomes)
44
Mode Mode The category that occurs the most frequently
Key difference in the mode for categorical and numerical data: Numerical data: Mounds do not need to be the same height. Categorical data: Modes must be roughly the same height. We use the same wording as before: Unimodal: One distinct mode Bimodal: Two modes with same (or very close) frequency Multimodal: More than two modes with (or close) frequency
45
Mode: Example In 2012, the Pew survey asked a new group of 2508 Americans which economic class they identified with. The mode is the middle class.
46
Variability Variability Think of this as diversity in the data values
What to look for: High variation: Each value is represented with about the same frequency (many observations in many different categories). Low variation: A small number of values appear a large number of times (many observations fall into a few categories). CAUTION! Variability here is more about the occurrence of many different values rather than many frequencies.
47
Variability: Example The bar charts below show the ethnic composition of two schools in the Los Angeles City School System. School A has the greater variability in ethnicity.
48
Interpreting graphs Section 2.5 Making Appropriate Graphs
Sergey Nivens. Shutterstock Interpreting graphs Making Appropriate Graphs Misleading Graphs
49
Appropriate Graphs Recall: The type of data you are dealing with determines the type of graph you use! Numerical Data Categorical Data Dotplot Pie Chart Histogram Bar Graph Stemplot
50
Appropriate Measures Recall: The type of data you are dealing with determines how you describe the distribution of data! Numerical Data Categorical Data Shape Mode Center Spread Variability
51
Misleading Graphs Watch For Example
Well designed graphs help us see patterns, but misleading graphs play tricks with our eyes and lead to wrong conclusions! Watch For Example Inappropriate scaling (starting at a value other than 0) Figure 2.30 Using icons of different sizes rather than bars Figure 2.31
52
Case Study Question: Are private 4-year schools better than public 4-year schools? Note: Better can mean different things! Measure: Student-to-teacher ratio (one of many choices) Data type: Numerical (see table 2.1 in book) When: The academic year Who: 89 private colleges and 49 public colleges
53
Case Study Analysis of data: What can you say about this data?
Can you answer the question of interest? (See book for answer)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.