Download presentation
Presentation is loading. Please wait.
Published byRalph Gray Modified over 9 years ago
1
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions
2
3.1 Displaying Distributions with Graphs Variables Examining Distributions of Variables Graphs for Categorical Variables Bar graphs Pie charts Graphs for Quantitative Variables Histograms Stemplots Time plots 2
3
Statistics Statistics is the science of learning from data. The first step in dealing with data is to organize your thinking about the data. An exploratory data analysis is the process of using statistical tools and ideas to examine data in order to describe their main features. 3 Exploring Data Begin by examining each variable by itself. Then move on to study the relationships among the variables. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data. Exploring Data Begin by examining each variable by itself. Then move on to study the relationships among the variables. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.
4
Variables We construct a set of data by first deciding which cases or observations or individuals or units we want to study. For each case, we record information about characteristics that we call variables. Individual An object described by data Individual An object described by data Variable Characteristic of the individual Variable Characteristic of the individual Categorical variable Places individual into one of several groups or categories. Categorical variable Places individual into one of several groups or categories. Quantitative variable Takes numerical values for which arithmetic operations make sense. Quantitative variable Takes numerical values for which arithmetic operations make sense. 4
5
Quantitative Variables 5 Quantitative variables can either be counts or measurements or rates. SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3 INTRODUCTION) FOR WHY RATES ARE IMPORTANT… RATE = # of occurrences of the event per X in the population of all possible occurrences (where X is a large number (10,000 ; 100,000 e.g.) Murder rate in NH County = (#murders in NHC/#possible murders(i.e., population) * large number (like 100,000) (in 2012, 9 murders, population estimate=209234; so the murder rate in NH County in 2012 is 9/209234 =0.00004301. Multiply by 100,000 to get the rate per 100,000. 0.00004301 * 100000 = 4.3 murders per 100,000 people (or per capita) – Guilford County: 26/500879 * 100000 = 5.2; etc…. Quantitative variables can either be counts or measurements or rates. SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3 INTRODUCTION) FOR WHY RATES ARE IMPORTANT… RATE = # of occurrences of the event per X in the population of all possible occurrences (where X is a large number (10,000 ; 100,000 e.g.) Murder rate in NH County = (#murders in NHC/#possible murders(i.e., population) * large number (like 100,000) (in 2012, 9 murders, population estimate=209234; so the murder rate in NH County in 2012 is 9/209234 =0.00004301. Multiply by 100,000 to get the rate per 100,000. 0.00004301 * 100000 = 4.3 murders per 100,000 people (or per capita) – Guilford County: 26/500879 * 100000 = 5.2; etc….
6
Distribution of a Variable 6 To examine a single variable, we graphically display its distribution. The distribution of a variable tells us what values it takes and with what frequency it takes on these values. Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the kind of the variable and how easy it is to draw them…. JMP makes easy work of graphing! The distribution of a variable tells us what values it takes and with what frequency it takes on these values. Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the kind of the variable and how easy it is to draw them…. JMP makes easy work of graphing! Categorical variable Pie chart – I don’t recommend pie charts! Bar graph – these are fine! Categorical variable Pie chart – I don’t recommend pie charts! Bar graph – these are fine! Quantitative variable Histogram Stemplot Quantitative variable Histogram Stemplot
7
Categorical Variables 7 The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into that category. Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories – hard to draw and hard to interpret! Bar graphs represent each category as a bar whose heights show the category counts or percents.
8
Pie Charts and Bar Graphs 8 MaterialWeight (million tons) Percent of total Food scraps25.911.2% Glass12.85.5% Metals18.07.8% Paper, paperboard86.737.4% Plastics24.710.7% Rubber, leather, textiles 15.86.8% Wood12.75.5% Yard trimmings27.711.9% Other7.53.2% Total231.9100.0%
9
Quantitative Variables 9 The distribution of a quantitative variable tells us what values the variable takes on and the frequency with which it takes on those values. Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class. Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable – original values are not hidden as in the histogram. Time plots plot each observation (on the vertical axis) against the time at which it was measured (on the horizontal axis).
10
10 To construct a stemplot: Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number). Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired. Stemplots
11
Example: Weight data―Introductory Statistics class Key 20 | 3 means 203 pounds Stems = 10s Leaves = 1s 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Stems Leaves 2 2 5 Stemplots 11
12
12 If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems. Example: If all of the data values were between 150 and 179, then we may choose to use the following stems: 15 15 16 16 17 17 Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”). Stemplots
13
13 For quantitative variables that take many values and/or large datasets: Divide the possible values into classes (equal widths). Count how many observations fall into each interval (may change to percents). Draw picture representing the distribution―bar heights are equivalent to the number (percent) of observations in each interval. JMP does all three of the above with a couple of clicks: Analyze -> Distribution -> choose the variable to be plotted. Histograms
14
14 Example: Weight data―Introductory Statistics class Histograms
15
15 In any graph of data, look for the overall pattern and for striking deviations from that pattern. You can describe the overall pattern by its shape, center, and spread. An important kind of deviation is an outlier, an individual that falls outside the overall pattern. Examining Distributions
16
16 A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. A distribution is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right Examining Distributions - Shape
17
AlaskaFlorida An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. This overall pattern is fairly symmetrical, except for two states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their populations. Large gaps in the distribution are places to look for outliers…. Outliers
18
18 A time plot shows behavior of a quantitative variable over time. Time is always on the horizontal axis, and the variable being plotted is on the vertical axis. Look for an overall pattern (trend), and deviations from this trend. Connecting the data points by lines may emphasize this trend. Look for patterns that repeat at known regular intervals (seasonal variations) Go over the US Regular Retail Gas Prices in JMP Time Plots
19
19 Time Plots Look at the gas price data…
20
20 HW – use JMP whenever possible to draw the graphs… HW: Begin reading Intro. to Ch.3 and section 3.1; work through the Examples in 3.1; do the Exercises #3.7, 3.10-3.14, 3.21, 3.24, 3.25, 3.27, 3.32, 3.33-3.36, 3.38 (JMP), 3.39
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.