Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding and Comparing Distributions

Similar presentations


Presentation on theme: "Understanding and Comparing Distributions"— Presentation transcript:

1 Understanding and Comparing Distributions
AP Statistics Chapter 5 Understanding and Comparing Distributions

2 Learning Goals Know how to construction and analysis boxplots.
Be able to calculate outliers and discuss how they deviate from the overall pattern of the data. Know how to compare the distribution of two or more groups by comparing their shapes, centers, and spreads. Be able to display data over time: Time Plots. Understand how re-expressing data can improve the symmetry of the data.

3 Know how to construction and analysis boxplots.
Learning Goal 1 Know how to construction and analysis boxplots.

4 Learning Goal 1: The Five-Number Summary
The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum). Example: The five-number summary for the daily wind speed is: Min 0.20 Q1 1.15 Median 1.90 Q3 2.93 Max 8.67

5 Learning Goal 1: The Five-Number Summary
Consists of the minimum value, Q1, the median, Q3, and the maximum value, listed in that order. Offers a reasonably complete description of the center and spread. Calculate on the TI-84 using 1-Var Stats. Used to construct the Boxplot. Example: Five-Number Summary 1: 20, 27, 34, 50, 86 2: 5, 10, 18.5, 29, 33

6 Learning Goal 1: Boxplot
A boxplot is a graphical display of the five-number summary. Boxplots are useful when comparing groups. Boxplots are particularly good at pointing out outliers.

7 Learning Goal 1: Boxplot
Boxplot: A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: 25% % % %

8 Learning Goal 1: Boxplots
A graph of the Five-Number Summary. Can be drawn either horizontally or vertically. The box represents the IQR (middle 50%) of the data. Show less detail than histograms or stemplots, they are best used for side-by-side comparison of more than one distribution.

9 Learning Goal 1: Constructing Boxplots
Put the data in increasing numerical order (if it isn’t already). Find the median of your set of data. Consider only the values to the left of the median, the lower half of the data, and find the median of the lower half, the 1st quartile Q1. Consider only the values to the right of the median, the upper half of the data, and find the median of the upper half, the 3rd quartile Q3. Determine the lowest and highest values of the data. Draw a scale on a number line and plot the lowest value, lower quartile, median, upper quartile, and the highest value on the number line. Put a line through the lower quartile, median, and upper quartile. Then put a box around those lines. Lastly draw a line from your extreme values to the box.

10 Learning Goal 1: Constructing Boxplots - Example
Put the date in increasing numerical order (if it isn’t already). Can use the TI-84, STATS/SortA(. Example: 100, 27, 34, 54, 59, 18, 52, 61, 78, 68, 82, 87, 85, 93, Ascending order… 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100.

11 Learning Goal 1: Constructing Boxplots - Example
Find the median of the data *Remember the median is the value exactly in the middle of an ordered set of numbers* 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100.

12 Learning Goal 1: Constructing Boxplots - Example
Next, consider only the values to the left of the median, the lower half of the data. 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100. Then, find the median of those numbers, the 1st quartile Q1. 18, 27, 34, 52, 54, 59, 61

13 Learning Goal 1: Constructing Boxplots - Example
Next, consider only the values to the right of the median, the upper half of the data. 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100. Then find the median of those numbers, the 3rd quartile Q3. 78, 82, 85, 87, 91, 93, 100.

14 Learning Goal 1: Constructing Boxplots - Example
Now indicate your lowest and highest values 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100.

15 Learning Goal 1: Constructing Boxplots - Example
Now we are ready to begin to draw the graph. 18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100. Plot the lowest value, lower quartile Q1, median, upper quartile Q3, and the highest value on a number line with an appropriate scale.

16 Learning Goal 1: Constructing Boxplots - Example
Put a line through the Lower Quartile Q1, Median, and Upper Quartile Q3. Then Put a box around those lines.

17 Learning Goal 1: Constructing Boxplots - Example
Lastly draw a line from your extreme values to the box. There is your Boxplot. Don’t forget the label the horizontal scale and the graph. Also, label the scale and title the graph.

18 Learning Goal 1: Constructing Boxplots – Your Turn
Given the commuting times (in minutes) of 20 randomly selected New York workers below, construct a boxplot. 10 30 5 25 40 20 15 85 65 60 45

19 Learning Goal 1: Shape from a Boxplot
Box Plots do not display the shape of the distribution as clearly as histograms, but may be useful in determining the shape. Skewed Left Skewed Right

20 Learning Goal 1: Shape from a Boxplot – Looking at the Median
If the median is close to the center of the box, the distribution of the data values will be approximately symmetrical. If the median is to the left of the center of the box, the distribution of the data values will be Skewed Right. If the median is to the right of the center of the box, the distribution of the data values will be Skewed Left. Skewed Right Skewed Left

21 Learning Goal 1: Shape from a Boxplot– Looking at the Whiskers
If the whiskers are approximately the same length, the distribution of the data values will be approximately symmetrical. If the right whisker is longer than the left whisker, the distribution of the data values will be Skewed Right. If the left whisker is longer than the right whisker, the distribution of the data values will be Skewed Left. Skewed Right Skewed Left

22 Learning Goal 1: Distribution Shape and Boxplot
Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

23 Learning Goal 1: Boxplots - Problem
Below you have a boxplot for the tar content of 25 different cigarettes. What is a plausible set of values for the five-number summary? Min = 13, Q1 = 10, Median = 12.6, Q3 = 14, Max = 15 Min = 1, Q1 = 8.5, Median = 12.6, Q3 = 15, Max = 17 Min = 1, Q1 = 8.5, Median = 11.5, Q3 = 13, Max = 15 Min = 8.5, Q1 = 10, Median = 11.5, Q3 = 15, Max = 17

24 Learning Goal 1: Boxplots - Problem
The shape of the boxplot below can be described as: Bi-modal Left-skewed Right-skewed Symmetric Uniform

25 Learning Goal 1: Boxplots - Problem
What is the approximate range of the Male Wrist Girth dataset shown below? 14.5 to 19.5 16.5 to 17 16.5 to 18 17 to 19.5 14.5 to 16.5 and 18 to 19.5

26 Learning Goal 1: Boxplots - Problem
What is the approximate interquartile range of the Male Wrist Girth dataset shown below? 14.5 to 19.5 16.5 to 17 16.5 to 18 17 to 19.5 14.5 to 16.5 and 18 to 19.5

27 Learning Goal 2 Be able to calculate outliers and discuss how they deviate from the overall pattern of the data.

28 Learning Goal 2: Outliers
Recall that an outlier is an extremely small or extremely large data value when compared with the rest of the data values. What should we do about outliers? Try to understand them in the context of the data. Causes Data error Special nature to the data

29 Learning Goal 2: Outliers
If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing. Note: The median and IQR are not likely to be affected by the outliers. The following procedure allows us to check whether a data value can be considered as an outlier.

30 Learning Goal 2: Outliers
IQR is used to determine if extreme values are actually outliers An observation is an outlier if it falls more than 1.5 times IQR below Q1 or above Q3. To test for outliers Construct an upper and lower fence Upper Fence = Q3 + (1.5)IQR Lower Fence = Q1 – (1.5)IQR If an observation falls outside the fences (ie. Greater than the upper fence or less than the lower fence) than it is an outlier. Indicate outliers as individual points or asterisk on a boxplot.

31 Learning Goal 2: Outliers – Example (odd set of data)
Data: 20, 25, 25, 27, 28, 31, 33, 34, 36, 37, 44, 50, 59, 85, 86 Find Q1, M, Q3, IQR and any outliers. Sort data Q Q3 lower half median upper half IQR = 50 – 27 = 23 Upper Fence = Q3 + (1.5)IQR = = 84.5 Lower Fence = Q1 – (1.5)IQR = 27 – 34.5 = -7.5 Outliers 85 and 86 (greater than the upper fence)

32 Learning Goal 2: Outliers – Example (even set of data)
Find Q1, M, Q3, IQR and outliers. Q Q3 lower half upper half median IQR = 29 – 10 = 19 Upper Fence = 29 + (1.5)IQR = = 57.5 Lower Fence = 10 – (1.5)IQR = 10 – 28.5 = -18.5 No Outliers

33 Learning Goal 2: Outliers – Your Turn
The data below represent the 20 countries with the largest number of total Olympic medals, including the United States, which had 101 medals for the 1996 Atlanta games. Determine whether the number of medals won by the United States is an outlier relative to the numbers for the other countries. Data values – 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15, 101.

34 Learning Goal 2: Outliers – Modified Boxplot
Plots outliers as isolated points, where regular boxplots conceal outliers. From now on when we say “boxplot”, we mean “modified boxplot”. The modified boxplot is more useful than the boxplot. Constructing a Modified Boxplot. Same as a boxplot with the exception of the “whiskers”. Extend the “left whisker” to the minimum value if there are no outliers or to the last data value greater than or equal to the lower fence if there are outliers. Extend the “right whisker” to the maximum value if there are no outliers or to the last data value less than or equal to the upper fence if there are outliers. Outliers (either low or high) are then represented by a dot or an asterisk.

35 Learning Goal 2: Constructing Modified Boxplots – Example (vertical)
Draw a single vertical axis spanning the range of the data. Label and Scale. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

36 Learning Goal 2: Constructing Modified Boxplots – Example (vertical)
Erect “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display.

37 Learning Goal 2: Constructing Modified Boxplots – Example (vertical)
Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker.

38 Learning Goal 2: Constructing Modified Boxplots – Example (vertical)
Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

39 Learning Goal 2: Modified Boxplots – Example
Outliers Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Q3 Median Q1

40 Learning Goal 2: Modified Boxplots – Problem
Which boxplot goes with the histogram of waiting times for the bus? (a) (b) (c) The data do not show any low outliers.

41 Learning Goal 2: Constructing Modified Boxplots – Your Turn
Given the commuting times (in minutes) of 20 randomly selected New York workers below, construct a modified boxplot (same data as earlier). 10 30 5 25 40 20 15 85 65 60 45

42 Learning Goal 2: More Outliers
Far Outlier – Data values farther than 3 IQRs from the quartiles. Outliers Far Outlier Boxplot

43 Learning Goal 2: Boxplots – TI-84
Use the AL 2008 run data to construct a boxplot. AL runs scored:

44 Learning Goal 2: Boxplots – TI-84
Store this data as a list. Press STAT and choose the first option EDIT Enter the 14 AL data values in L1.

45 Learning Goal 2: Boxplots – TI-84
Now set up the boxplot. Exit back into the home screen. Then press STAT PLOT (2nd and y= ). Choose Plot1. Then, turn Plot1 on. Scroll to Type and choose the boxplot icon (with outliers). It is the first option in the second row. Enter L1 for Xlist. Enter 1 for Freq. Choose a mark for outliers.

46 Learning Goal 2: Boxplots – TI-84
Now display the graph. Press ZOOM. Then select option 9: ZOOMSTAT. Press enter. Press TRACE and scroll around to see different statistics for the distribution.

47 Learning Goal 2: Boxplots – TI-84
STAT, EDIT, (enter data). STAT PLOT Two boxplots (1st modified boxplot, 2nd regular boxplot) ZOOM, #9:ZoomStat Use trace key to indicate values and find outliers.

48 Learning Goal 2: Boxplots TI-84 - Your Turn
Data: 20, 25, 25, 27, 28, 31, 33, 34, 36, 37, 44, 50, 59, 85, 86 Use TI-83/84

49 Learning Goal 3 Know how to compare the distribution of two or more groups by comparing their shapes, centers, and spreads.

50 Learning Goal 3: Comparing Distributions
We can answer much more interesting questions about variables when we compare distributions for different groups. Some of the most interesting statistics questions involve comparing two or more groups. Always discuss shape, center, spread, and possible outliers whenever you compare distributions of a quantitative variable.

51 Learning Goal 3: Comparing Distributions
When asked to compare two distributions, you must address four points: The shape The outliers The center The spread Think of the acronym SOCS to help you remember what to address.

52 Learning Goal 3: Describing and Comparing Distributions
Key Words: Comparing the Distribution of a Quantitative Variable. 2 graphs S Roughly symmetric (slightly/heavily) Right-skewed (slightly/heavily) Left-skewed unimodal, bell-shaped, bimodal, gap between ? and ?. C Typical value (midpoint, median) About the same, greater than, less than Data varies from ? to ? varies more than, varies less than O Possible outlier at, possible outlier between ? No outlier.

53 Learning Goal 3: Comparing Distributions
Compare the histogram and boxplot for daily wind speeds: How does each display represent the distribution? The shape of a distribution is not always evident in a boxplot. Boxplots are particularly good at pointing out outliers.

54 Learning Goal 3: Comparing Distributions - Example
It is almost always more interesting to compare groups. With histograms, note the shapes, centers, and spreads of the two distributions. When using histograms to compare data sets make sure to use the same scale for both sets of data. What does this graphical display tell you?

55 Learning Goal 3: Comparing Distributions – Example Solution
The shapes, centers, and spreads of these two distributions are strikingly different. During spring and summer (histogram on the left), the distribution is skewed to the right. A typical day has an average wind speed of only 1 to 2 mph, with a range between .5 and 4 mph. In the colder months (histogram on the right), the shape is less strongly skewed and more spread out. The typical wind speed is higher, between 1 and 6 mph, and days with average wind speeds above 3 mph are not unusual. There is also an outlier at about 9 mph.

56 Learning Goal 3: Comparing Distributions - Example
Boxplots offer an ideal balance of information and simplicity, hiding the details while displaying the overall summary information, when comparing different groups. We often plot them side by side for groups or categories we wish to compare. What do these boxplots tell you?

57 Learning Goal 3: Comparing Distributions – Example Solution
By placing the boxplots side by side, we can easily see which groups have higher medians, which have the greater IQRs, where the middle 50% of the data is located in each group, and which have the greater overall range When the boxes are placed in order, we can get a general idea of patterns in both the centers and the spreads. Equally important, we can see past any outliers in making these comparisons because they’ve been displayed separately.

58 Learning Goal 3: Comparing Distributions - Example
Here are boxplots for the number of runs scored in the AL and in the NL during (Note: the plots are on the same scale for comparison purposes.) Let’s compare using our four points (SOCS).

59 Learning Goal 3: Comparing Distributions – Example Solution
Shape The AL distribution is skewed slightly left (the left half of the distribution appears more spread out). The NL distribution is approximately symmetric.

60 Learning Goal 3: Comparing Distributions – Example Solution
Outliers Neither distribution contains an outlier.

61 Learning Goal 3: Comparing Distributions – Example Solution
Center Typically, teams score more runs in the AL, about 785, because the median for the AL distribution is higher than the median for the NL distribution , about 740.

62 Learning Goal 3: Comparing Distributions – Example Solution
Spread The AL distribution is slightly more spread out because it has both a larger range, 250 vs. 225, and larger IQR, around 100 vs. 70. This indicates there is more variability among AL teams and more consistency among NL teams.

63 Learning Goal 3: Comparing Distributions – Your Turn
These data represent the responses of 20 female and 20 male Statistics students to the question, “How many pairs of shoes do you have?” Construct a stemplot and compare the two distributions. Females 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 Males 14 7 6 5 12 38 8 10 11 4 22 35

64 Learning Goal 3: Comparing Distributions - Problem
Look at the following side-by-side boxplots and compare the female and male shoulder girth. Females have a typically smaller shoulder girth than males. Females have a typically larger shoulder girth than males. Females and males have about the same shoulder girths.

65 Learning Goal 3: Comparing Distributions - Problem
Look at the following side-by-side boxplots and compare the female and male thigh girth. Females have a typically smaller thigh girth than males. Females have a typically larger thigh girth than males. Females and males have about the same thigh girth.

66 Learning Goal 3: Comparing Distributions - Problem
Compare the centers of Distribution A (Female Shoulder Girth) and Distribution B (Male Shoulder Girth) shown below. The center of Distribution A is greater than the center of Distribution B. The center of Distribution A is less than the center of Distribution B. The center of Distribution A is equal to the center of Distribution B.

67 Learning Goal 3: Comparing Distributions - Problem
Compare the spreads of Distribution A (Female Shoulder Girth) and Distribution B (Male Shoulder Girth) shown below. The spread of Distribution A is greater than the spread of Distribution B. The spread of Distribution A is less than the spread of Distribution B. The spread of Distribution A is equal to the spread of Distribution B.

68 Be able to display data over time: Time Plots.
Learning Goal 4 Be able to display data over time: Time Plots.

69 Learning Goal 4: Time Plots
Used for displaying a time series, a data set collected over time. Plots each observation on the vertical scale against the time it was measured on the horizontal scale. Points are usually connected. Common patterns in the data over time, known as trends, should be noted.

70 Learning Goal 4: Time Plots - Example
A Time Plot from 1995 – 2001 of the number of people worldwide who use the Internet.

71 Learning Goal 4: Constructing Time Plots
To make a Time Plot – Put time on the horizontal scale. Put the variable being measured on the vertical scale. Connect the data points by lines. Interpreting Time Plots – look for An overall pattern – TREND (upward or downward) Outliers – strong deviations from the overall pattern or trend.

72 Learning Goal 4: Time Plots - Patterns
We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series: A trend is a rise or fall that persists over time, despite small irregularities. A pattern that repeats itself at regular intervals of time is called seasonal variation.

73 Learning Goal 4: Describing Time Plots
This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry. Retail price of fresh oranges over time

74 Learning Goal 4: Describing Time Plots
Scales Matter How you stretch the axes and choose your scales can give a different impression. A picture is worth a thousand words, BUT There is nothing like hard numbers.  Look at the scales.

75 Learning Goal 4: Describing Time Plots
May contain outliers, deviations from the pattern. Try to explain. Interpolation – to predict a value from the pattern between known values. (within the data) Extrapolation – to predict a value by extending the pattern into the future (outside the data). Dangerous, no guarantee the pattern continues.

76 Learning Goal 4: Time Plots - Problem
Which of these plots is a time plot? Plot A Plot B both neither

77 Learning Goal 4: Time Plots - Problem
What type of trend does the following time plot show? Downward Upward No trend

78 Learning Goal 4: Time Plots - Problem
What would be the correct interpretation of the following graph? There is an upward trend in the data. There is no trend in the data. There is a downward trend in the data. The data show an existence of seasonal variation.

79 Learning Goal 5 Understand how re-expressing data can improve the symmetry of the data.

80 Learning Goal 5: Re-expressing Skewed Data to Improve Symmetry
When the data are skewed it can be hard to summarize them simply with a center and spread, and hard to decide whether the most extreme values are outliers or just part of a stretched out tail. How can we say anything useful about such data? The secret is to re-express the data by applying a simple function (logarithms, square roots, and reciprocals) to each value.

81 Learning Goal 5: Re-expressing Skewed Data to Improve Symmetry
One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function). Note the change in skewness from the raw data (top) to the transformed data (right):


Download ppt "Understanding and Comparing Distributions"

Similar presentations


Ads by Google