Presentation is loading. Please wait.

Presentation is loading. Please wait.

Displaying and Summarizing Quantitative Data

Similar presentations


Presentation on theme: "Displaying and Summarizing Quantitative Data"— Presentation transcript:

1 Displaying and Summarizing Quantitative Data
Chapter 4 Displaying and Summarizing Quantitative Data Copyright © 2009 Pearson Education, Inc.

2 Dealing With a Lot of Numbers…
Summarizing the data will help us when we look at large sets of quantitative data. Without summaries of the data, it’s hard to grasp what the data tell us. The best thing to do is to make a picture… We can’t use bar charts, frequency tables or pie charts for quantitative data, since those displays are for categorical variables.

3 Types of Graphs (Quantitative Data)
Histograms Stem Plots (Stem and Leaf) Box Plots (Box and Whisker) Dot Plots

4 Histograms: Earthquake Magnitudes
A histogram breaks up the entire span of values covered by the quantitative variable into equal-width piles called bins. A histogram plots the bin counts as the heights of bars. (like a bar chart) Here is a histogram of earthquake magnitudes

5 Histograms: Earthquake magnitudes
A relative frequency histogram displays the percentage of cases in each bin instead of the count. In this way, relative frequency histograms are faithful to the area principle. Here is a relative frequency histogram of earthquake magnitudes:

6 Stem Plots (Stem and Leaf)
Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values.

7 Constructing a Stem-and-Leaf Display
First, cut each data value into leading digits (“stems”) and trailing digits (“leaves”). Use the stems to label the bins. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem.

8 Dotplots Winning Time (sec)
A dotplot is a simple display. It just places a dot along an axis for each case in the data. The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. You might see a dotplot displayed horizontally or vertically. 30 20 10 # O F R A C E S Winning Time (sec)

9 Think Before You Draw, Again
Remember before you decide what type of graph to use check what kind of variable you are graphing. (Quantitative or Categorical) Shape, Center, and Spread When describing a distribution, make sure to always tell about three things: Shape, Center, and Spread…

10 What is the Shape of the Distribution?
Is the histogram symmetric or skewed? Do any unusual features stick out?

11 Symmetry Is the histogram symmetric?
If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

12 Symmetry (cont.) The (usually) thinner ends of a distribution are called the tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail. In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.

13 Anything Unusual? Do any unusual features stick out?
Sometimes it’s the unusual features that tell us something interesting or exciting about the data. You should always mention any stragglers, or outliers, that stand off away from the body of the distribution. Are there any gaps in the distribution? If so, we might have data from more than one group.

14 Anything Unusual? (cont.)
The following histogram appears to have outliers —there are three cities in the leftmost bar:

15 Symmetric Distributions – The Mean
When we have symmetric data, there is an alternative other than the median, If we want to calculate a number, we can average the data. We use the Greek letter sigma to mean “sum” and write: The formula says that to find the mean, we add up the numbers and divide by n.

16 Center of a Distribution: Median
The median is the value with exactly half the data values below it and half above it. It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. It has the same units as the data.

17 Symmetric Distributions (Mean vs. Median)
Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center. To choose between the mean and median, start by looking at the data. If the histogram is symmetric and there are no outliers, use the mean. However, if the histogram is skewed or with outliers, you are better off with the median.

18 Spread: The Interquartile Range
The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are…

19 Spread: The Interquartile Range
Quartiles divide the data into four equal sections. One quarter of the data lies below the lower quartile, Q1 One quarter of the data lies above the upper quartile, Q3. The difference between the quartiles is the interquartile range (IQR), so IQR = upper quartile – lower quartile IQR = Q – Q1

20 Spread: The Interquartile Range (cont.)
The lower and upper quartiles are the 25th and 75th percentiles of the data, so… The IQR contains the middle 50% of the values of the distribution, as shown in figure:

21 5-Number Summary The 5-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum) The 5-number summary for the recent tsunami earthquake Magnitudes looks like this: We will use the 5-Number summary later on to create a box plot.

22 What About Spread? The Standard Deviation
A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations.

23 What About Spread? The Standard Deviation
The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them: The variance will play a role later in our study, but it is problematic as a measure of spread—it is measured in squared units!

24 What About Spread? The Standard Deviation
The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.

25 Thinking About Variation
Since Statistics is about variation, spread is an important fundamental concept of Statistics. Measures of spread help us talk about what we don’t know. When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small. When the data values are scattered far from the center, the IQR and standard deviation will be large.

26 Summary: Shape, Center, and Spread
Shape: Symmetric vs. Skewed (left or right) Center: Mean vs. Median Spread: Standard Deviation vs. IQR Mean and Standard Deviation always go together Median and IQR always go together.

27 Tell - What About Unusual Features?
If there are multiple modes, try to understand why. If you identify a reason for the separate modes, it may be good to split the data into two groups. If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.

28 Understanding and Comparing Distributions
Chapter 5 Understanding and Comparing Distributions Copyright © 2009 Pearson Education, Inc.

29 The Five-Number Summary
The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum). Example: The five-number summary for the daily wind speed is: Max 8.67 Q3 2.93 Median 1.90 Q1 1.15 Min 0.20

30 Daily Wind Speed: Making Boxplots
A boxplot is a graphical display of the five-number summary. Boxplots are particularly useful when comparing groups.

31 Constructing Boxplots (Outlier Check)
Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

32 Constructing Boxplots (Outlier Check)
Erect “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display.

33 Constructing Boxplots (cont.)
Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker.

34 Constructing Boxplots (cont.)
Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

35 Formula: (Q1 – 1.5(IQR), Q3 + 1.5(IQR)) Needs to be memorized…
Check For Outliers First we find the IQR Then we multiply the IQR by 1.5 We then subtract that number from Q1 We also add that number to Q3 Any value falling outside of that range of values is considered an outlier. Formula: (Q1 – 1.5(IQR), Q (IQR)) Needs to be memorized…

36 Summary (Test Topics) Categorical vs. Quantitative Variables
Types of Graphs for Categorical and Quantitative Data Describe a Distribution: (Shape, Center, Spread) Shape: Symmetric vs. Skewed Center: Mean vs. Median Spread: IQR vs. Standard Deviation Checking for Outliers: Q1 – 1.5(IQR) Q (IQR)

37 Summary: Shape, Center, and Spread
Mean and Standard Deviation always go together Median and IQR always go together. Use MEAN and STANDARD DEVIATION when the distribution is symmetric and there are NO Outliers! Otherwise use Median and IQR.

38 The Standard Deviation as a Ruler and the Normal Model
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright © 2009 Pearson Education, Inc.

39 Standardizing with z-scores
We compare individual data values to their mean, relative to their standard deviation using the following formula: We call the resulting values standardized values, denoted as z. They can also be called z-scores.

40 Standardizing with z-scores (cont.)
Standardized values have no units. z-scores measure the distance of each data value from the mean in standard deviations. A negative z-score tells us that the data value is below the mean, while a positive z-score tells us that the data value is above the mean.

41 Benefits of Standardizing
Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. Thus, we can compare values that are measured on different scales, with different units, or from different populations.

42 Shifting Data Adding or subtracting a constant to every data value adds or subtracts the same constant to measures of position: center, percentiles, max/min. Its shape and spread - range, IQR, standard deviation - remain unchanged.

43 Rescaling Data * When we multiply (or divide) all the data values by any constant, all measures of position (such as the mean, median, and percentiles) and measures of spread (such as the range, the IQR, and the standard deviation) are multiplied (or divided) by that same constant.

44 Back to z-scores Standardizing data into z-scores shifts the data by subtracting the mean and rescales the values by dividing by their standard deviation. Standardizing into z-scores does not change the shape of the distribution. Standardizing into z-scores changes the center by making the mean 0. Standardizing into z-scores changes the spread by making the standard deviation 1.

45 When Is a z-score BIG? A z-score gives us an indication of how unusual a value is because it tells us how far it is from the mean. Remember that a negative z-score tells us that the data value is below the mean, while a positive z-score tells us that the data value is above the mean. The larger a z-score is (negative or positive), the more unusual it is.

46 When Is a z-score Big? (cont.)
There is no universal standard for z-scores, but there is a model that shows up over and over in Statistics. This model is called the Normal model (You may have heard of “bell-shaped curves.”). Normal models are appropriate for distributions whose shapes are unimodal and roughly symmetric. These distributions provide a measure of how extreme a z-score is.

47 When Is a z-score Big? (cont.)
There is a Normal model for every possible combination of mean and standard deviation. We write N(μ,σ) to represent a Normal model with a mean of μ and a standard deviation of σ. We use Greek letters because this mean and standard deviation do not come from data—they are numbers (called parameters) that specify the model.

48 When Is a z-score Big? (cont.)
Summaries of data, like the sample mean and standard deviation, are written with Latin letters. Such summaries of data are called statistics. When we standardize Normal data, we still call the standardized value a z-score, and we write

49 When Is a z-score Big? (cont.)
When we use the Normal model, we are assuming the distribution is Normal. We cannot check this assumption in practice, so we check the following condition: Nearly Normal Condition: The shape of the data’s distribution is unimodal and symmetric. This condition can be checked by making a histogram or a Normal probability plot (to be explained later!).

50 The 68-95-99.7 Rule (Empirical Rule)
Normal models give us an idea of how extreme a value is by telling us how likely it is to find one that far from the mean. We can find these numbers precisely, but until then we will use a simple rule that tells us a lot about the Normal model…

51 The 68-95-99.7 Rule (cont.) It turns out that in a Normal model:
about 68% of the values fall within one standard deviation of the mean; about 95% of the values fall within two standard deviations of the mean; and, about 99.7% of the values fall within three standard deviations of the mean.

52 The Rule (cont.) The following shows what the Rule tells us:

53 Finding Normal Percentiles by Hand
When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. (Extra Z-Score Charts are on my website!)

54 Finding Normal Percentiles by Hand (cont.)
Table Z is the standard Normal table. We have to convert our data to z-scores before using the table. The figure shows us how to find the area to the left when we have a z-score of 1.80:

55 From Percentiles to Scores: z in Reverse
Sometimes we start with areas and need to find the corresponding z-score or even the original data value. Example: What z-score represents the first quartile in a Normal model?

56 From Percentiles to Scores: z in Reverse
Look in Table Z for an area of The exact area is not there, but is pretty close. This figure is associated with z = -0.67, so the first quartile is 0.67 standard deviations below the mean.

57 What Can Go Wrong? (cont.)
Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results.

58 1. A normal distribution of scores has a standard deviation of 10
1. A normal distribution of scores has a standard deviation of 10. Find the z-scores corresponding to each of the following values: a) A score of 60, where the mean score of the sample data values is 40. b) A score that is 30 points below the mean. c) A score of 80, where the mean score of the sample data values is 30. d) A score of 20, where the mean score of the sample data values is 50.

59 2. IQ scores have a mean of 100 and a standard deviation of 16
2. IQ scores have a mean of 100 and a standard deviation of 16. Albert Einstein reportedly had an IQ of 160. a. What is the difference between Einstein's IQ and the mean? b. How many standard deviations is that? c. Convert Einstein’s IQ score to a z score. d. If we consider “usual IQ scores to be those that convert z scores between -2 and 2, is Einstein’s IQ usual or unusual?

60 3. Women's heights have a mean of 63. 6 in
3. Women's heights have a mean of 63.6 in. and a standard deviation of 2.5 inches. Find the z score corresponding to a woman with a height of 70 inches and determine whether the height is unusual. What percent of women have a height higher than 70 inches?

61 4. Three students take equivalent stress tests
4. Three students take equivalent stress tests. Which is the highest relative score (meaning which has the largest z score value)? a. A score of 144 on a test with a mean of 128 and a standard deviation of 34. b. A score of 90 on a test with a mean of 86 and a standard deviation of 18. c. A score of 18 on a test with a mean of 15 and a standard deviation of 5.

62 5. As mentioned above, IQ scores are normally distributed with mean 100 and std deviation of 16.
What percent of people have an IQ of less than 100? What percent of people have an IQ between 84 and 100? What percent of people have an IQ higher than 140? What percent of people have an IQ less than 80?


Download ppt "Displaying and Summarizing Quantitative Data"

Similar presentations


Ads by Google