Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/54 Statistics Descriptive Statistics— Tables and Graphics.

Similar presentations


Presentation on theme: "1/54 Statistics Descriptive Statistics— Tables and Graphics."— Presentation transcript:

1 1/54 Statistics Descriptive Statistics— Tables and Graphics

2 2/54 Summarizing Qualitative Data Summarizing Quantitative Data The Stem-and-Leaf Display( 枝葉圖 ) Cross-tabulations and Scatter Diagrams Contents

3 STATISTICS in PRACTICE The Colgate-Palmolive Company uses statistics in its quality assurance program for home laundry detergent products. One concern is customer satisfaction with the quantity of detergent in a carton. To control the problem of heavy detergent powder, limits are placed on the acceptable range of powder density.

4 STATISTICS in PRACTICE Statistical samples are taken and the density of each powder sample is measured. Data summaries are then provided for operating personnel so that corrective action can be taken if necessary to keep the density within the desired quality.

5 Summarizing Qualitative Data Frequency Distribution Relative Frequency Distributions Percent Frequency Distributions Bar Graphs Pie Charts

6 A frequency distribution is a tabular summary of A frequency distribution is a tabular summary of data showing the frequency (or number) of items data showing the frequency (or number) of items in each of several nonoverlapping classes. in each of several nonoverlapping classes. A frequency distribution is a tabular summary of A frequency distribution is a tabular summary of data showing the frequency (or number) of items data showing the frequency (or number) of items in each of several nonoverlapping classes. in each of several nonoverlapping classes. The objective is to provide insights about the data The objective is to provide insights about the data that cannot be quickly obtained by looking only at that cannot be quickly obtained by looking only at the original data. the original data. The objective is to provide insights about the data The objective is to provide insights about the data that cannot be quickly obtained by looking only at that cannot be quickly obtained by looking only at the original data. the original data. Summarizing Qualitative Data -- Frequency Distribution

7 Example: Data from a sample of 50 Soft Drink Purchases Summarizing Qualitative Data -- Frequency Distribution

8 Frequency Distribution Soft Drink Frequency Coke Classic 19 Diet Coke 8 Dr. Pepper 5 Pepsi-Cola 13 Sprite 5 Total 50 Summarizing Qualitative Data -- Frequency Distribution

9 Example: Marada Inn, Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. Summarizing Qualitative Data -- Frequency Distribution

10 The ratings provided by a sample of 20 guests are: Below Average Below Average Above Average Above Average Average Average Above Average Above Average Average Average Above Average Above Average Average Average Above Average Above Average Below Average Below Average Poor Poor Excellent Excellent Above Average Above Average Average Average Above Average Above Average Below Average Below Average Poor Poor Above Average Above Average Average Average Summarizing Qualitative Data -- Frequency Distribution

11 Poor Below Average Average Above Average Excellent 2 3 5 9 1 Total 20 RatingFrequency Summarizing Qualitative Data -- Frequency Distribution

12 The relative frequency of a class is the fraction or The relative frequency of a class is the fraction or proportion of the total number of data items proportion of the total number of data items belonging to the class. belonging to the class. The relative frequency of a class is the fraction or The relative frequency of a class is the fraction or proportion of the total number of data items proportion of the total number of data items belonging to the class. belonging to the class. Frequency of the class Relative frequency of a class= n Summarizing Qualitative Data – Relative Frequency

13 A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Summarizing Qualitative Data – Relative Frequency Distribution

14 Example: Relative and Percent Frequency Distribution of Soft Drink Purchases Soft Drink Relative Frequency Coke Classic.38 Diet Coke.16 Dr. Pepper.10 Pepsi-Cola.26 Sprite.10 Total 1.00 Summarizing Qualitative Data – Relative Frequency Distribution

15 Summarizing Qualitative Data – Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.

16 Example: Percent Frequency Distribution of Soft Drink Purchases Soft Drink Percent Frequency Coke Classic 38 Diet Coke 16 Dr. Pepper 10 Pepsi-Cola 26 Sprite 10 Total 100 Summarizing Qualitative Data – Percent Frequency Distribution

17 Poor Below Average Average Above Average Excellent.10.15.25.45.05 Total 1.00 10 15 25 45 5 100 Relative RelativeFrequency Percent PercentFrequency Rating.10(100) = 10 1/20 =.05 Summarizing Qualitative Data – Relative and Percent Frequency Distribution

18 Summarizing Qualitative Data – Bar Graph A bar graph is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis) Using a bar of fixed width drawn above each class label, we extend the height appropriately.

19 Example: Bar Graph of Soft Drink Purchases Summarizing Qualitative Data – Bar Graph

20 Poor Below Average Below Average Above Average Above Average Excellent Frequency Rating 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 Marada Inn Quality Ratings Summarizing Qualitative Data – Bar Graph

21 Summarizing Qualitative Data – Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of.25 would consume.25(360) = 90 degrees of the circle.

22 Example: Pie Chart of Soft Drink Purchases Summarizing Qualitative Data – Pie Chart

23 Below Average 15% Below Average 15% Average 25% Average 25% Above Average 45% Above Average 45% Poor 10% Poor 10% Excellent 5% Marada InnQuality Ratings Marada Inn Quality Ratings Summarizing Qualitative Data – Pie Chart--Example

24 Insights Gained from the Preceding Pie Chart One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.

25 Summarizing Quantitative Data Frequency Distribution Relative Frequency and Percent Frequency Distributions Dot Plot Histogram Cumulative Distributions Ogive( 折線圖 )

26 The three steps necessary to define the classes for a frequency distribution with quantitative data are: 1. Determine the number of nonoverlapping classes. we recommend using between 5 and 20 classes. 2. Determine the width of each class. 3. Determine the class limits. Largest data value –Smallest data value Approximate class width= Number of classes Summarizing Quantitative Data-- Frequency Distribution

27 Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes

28 Guidelines for Selecting Width of Classes Use classes of equal width. Approximate Class Width = Largest data value –Smallest data value Number of classes Summarizing Quantitative Data-- Frequency Distribution

29 Example: These data show the time in days required to complete year-end audits for a sample of 20 clients of Sanderson and Clifford, a small public accounting firm with the data rounded to the nearest day. YEAR-END AUDIT TIMES (IN DAYS) 12141918 15151817 20272223 22213328 14181613 Summarizing Quantitative Data-- Frequency Distribution

30 Example: 1. Number of classes = 5 2. provides an approximate class width of (33 — 12)/5= 4.2. 3. We therefore decided to round up and use a class width of five days in the frequency distribution. Audit Time Frequency (days) 10-14 4 15-19 8 20-24 5 25-29 2 30-34 1 Total 20 Summarizing Quantitative Data-- Frequency Distribution

31 The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Summarizing Quantitative Data-- Frequency Distribution

32 Sample of Parts Cost for 50 Tune-ups Summarizing Quantitative Data-- Frequency Distribution

33 For Hudson Auto Repair, if we choose six classes: 50-59 60-69 70-79 80-89 90-99 100-109 2 13 16 7 7 5 Total 50 Parts Cost ($) Frequency Approximate Class Width = (109 - 52)/6 = 9.5  10 Summarizing Quantitative Data-- Frequency Distribution

34 50-59 60-69 70-79 80-89 90-99 100-109 Parts Cost ($).04.26.32.14.14.10 Total 1.00 Relative RelativeFrequency 4 26 32 14 14 10 100 Percent Frequency 2/50.04(100) Summarizing Quantitative Data-- Relative Frequency and Percent Frequency Distributions

35 Only 4% of the parts costs are in the $50-59 class. The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class. 30% of the parts costs are under $70. 10% of the parts costs are $100 or more. Insights Gained from the Percent Frequency Distribution Summarizing Quantitative Data-- Relative Frequency and Percent Frequency Distributions

36 Dot Plot One of the simplest graphical summaries of data is a dot plot. A horizontal axis shows the range of data values. Then each data value is represented by a dot placed above the axis. Example: Dot Plot for The Audit Time Data

37 Dot Plot

38 50 60 70 80 90 100 110 Cost ($) Dot Plot Tune-up Parts Cost...................................................

39 Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

40 Histogram Example: Histogram for The Audit Time Data

41 Histogram 2 2 4 4 6 6 8 8 10 12 14 16 18 Parts Cost ($) Parts Cost ($) Frequency 50-59 60-69 70-79 80-89 90-99 100-110 Tune-up Parts Cost

42 Histogram provides information about the shape. Symmetric Left tail is the mirror image of the right tail Examples: heights and weights of people Histogram Relative Frequency.05.10.15.20.25.30.35 0 0

43 Relative Frequency.05.10.15.20.25.30.35 0 0 Histogram Moderately Skewed Left A longer tail to the left Example: exam scores

44 Relative Frequency.05.10.15.20.25.30.35 0 0 Moderately Right Skewed A Longer tail to the right Example: housing values Histogram

45 Relative Frequency.05.10.15.20.25.30.35 0 0 Histogram Highly Skewed Right A very long tail to the right Example: executive salaries

46 Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class. Cumulative Distributions Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class.

47 Cumulative Distributions Example: Cumulative Frequency, Cumulative Relative Frequency and Cumulative Percent Frequency Distributions for the Audit Data.

48 Cumulative Distributions Hudson Auto Repair <59 <69 <79 <89 <99 <109 Cost ($) Cumulative CumulativeFrequency RelativeFrequency CumulativePercent Frequency Frequency 2 15 31 38 45 50.04.30.62.76.90 1.00 4 30 62 76 90 100 2 + 13 15/50.30(100)

49 Ogive An ogive is a graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are the: cumulative frequencies, or cumulative relative frequencies, or cumulative percent frequencies

50 Ogive The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines.

51 Ogive These gaps are eliminated by plotting points halfway between the class limits. Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69 class, and so on. Hudson Auto Repair Because the class limits for the parts- cost data are 50-59, 60-69, and so on, there appear to be one-unit gaps from 59 to 60, 69 to 70, and so on.

52 Ogive Example: Ogive for the Audit Time Data.

53 Parts Parts Cost ($) Parts Parts Cost ($) 20 40 60 80 100 Cumulative Percent Frequency 50 60 70 80 90 100 110 (89.5, 76) Ogive with Cumulative Percent Frequencies Tune-up Parts Cost

54 Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly. One such technique is the stem-and-leaf display.

55 Stem-and-Leaf Display A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line.

56 Stem-and-Leaf Display To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf.

57 Stem-and-Leaf Display Example: Number of Questions Answered Correctly on An Aptitude Test

58 Stem-and-Leaf Display Stem: The numbers to the left of the vertical line (6, 7, 8, 9, 10, 11, 12, 13, and 14). Leaf: each digit to the right of the vertical line.

59 Stem-and-Leaf Display Although the stem-and-leaf display may appear to offer the same information as a histogram, it has two primary advantages. The stem-and-leaf display is easier to construct by hand. Within a class interval, the stem-and-leaf display provides more information than the histogram because the stem-and-leaf shows the actual data.

60 Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.

61 Example: Hudson Auto Repair Sample of Parts Cost for 50 Tune-ups

62 Stem-and-Leaf Display 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf

63 Stretched Stem-and-Leaf Display If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of 5 - 9.

64 Stretched Stem-and-Leaf Display 5 5 9 1 4 7 7 7 8 9 1 3 5 8 9 0 0 2 3 5 5 5 6 7 8 9 9 9 1 1 2 2 3 4 4 5 6 7 8 8 8 9 9 9 2 2 2 2 7 2 5 5 6 6 7 7 8 8 9 9 10 10

65 Stem-and-Leaf Display Leaf Units Where the leaf unit is not shown, it is assumed to equal 1. Leaf units may be 100, 10, 1, 0.1, and so on. In the preceding example, the leaf unit was 1. A single digit is used to define each leaf.

66 Example: Leaf Unit = 0.1 If we have data with values such as 8 9 10 11 Leaf Unit = 0.1 6 8 1 4 2 0 7 8.6 11.79.49.110.211.08.8 a stem-and-leaf display of these data will be

67 Example: Leaf Unit = 10 If we have data with values such as 16 17 18 19 Leaf Unit = 10 8 1 9 0 3 1 7 1806171719741791168219101838 a stem-and-leaf display of these data will be The 82 in 1682 is rounded down to 80 and is represented as an 8.

68 Cross-tabulations and Scatter Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Cross-tabulation and a scatter diagram are two methods for summarizing the data for two (or more) variables simultaneously.

69 Crosstabulation The left and top margin labels define the classes for the two variables. Crosstabulation can be used when: one variable is qualitative and the other is quantitative, both variables are qualitative, or both variables are quantitative. A crosstabulation is a tabular summary of data for two variables.

70 Crosstabulation Example: Data from Zagat’s Restaurant Review. Data on a restaurant’s quality rating and typical meal price are reported. Quality rating is a qualitative variable with rating categories of good, very good, and excellent. Meal price is a quantitative variable that ranges from $10 to $49.

71 Crosstabulation

72 Example: Crosstabulation of Quality Rating and Meal Price for 300 Los Angeles Restaurants

73 Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 18 6 19 12 55 45 30 20 35 15 Total 100 12 14 16 3 Home Style Home Style Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable variable qualitative

74 Crosstabulation Insights Gained from Preceding Crosstabulation Only three homes in the sample are an A-Frame style and priced at more than $99,000. The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000.

75 PriceRange Colonial Log Split A-Frame Colonial Log Split A-Frame Total < $99,000 > $99,000 18 6 19 12 5545 30 20 35 15 Total 100 12 14 16 3 Home Style Home Style Crosstabulation Frequency distribution for the price variable Frequency distribution for the home style variable

76 Crosstabulation: Row or Column Percentages Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

77 PriceRange Colonial Log Split A-Frame Colonial Log Split A-Frame Total < $99,000 > $99,000 32.73 10.91 34.55 21.82 100100 Note: row totals are actually 100.01 due to rounding. 26.67 31.11 35.56 6.67 Home Style Home Style (Colonial and > $99K)/(All >$99K) × 100 = (12/45) × 100 Crosstabulation: Row Percentages

78 PriceRange Colonial Log Split A-Frame Colonial Log Split A-Frame < $99,000 > $99,000 60.00 30.00 54.29 80.00 40.00 70.00 45.71 20.00 Home Style Home Style 100 100 100 100 Total (Colonial and > $99K)/(All Colonial) × 100 = (12/30) × 100 Crosstabulation: Column Percentages

79 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data.

80 One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. A scatter diagram is a graphical presentation of the relationship between two quantitative variables. Scatter Diagram and Trendline

81 The general pattern of the plotted points suggests the overall relationship between the variables. Scatter Diagram and Trendline A trendline is an approximation of the relationship.

82 Scatter Diagram and Trendline A Positive Relationship x y

83 Scatter Diagram and Trendline A Negative Relationship x y

84 Scatter Diagram and Trendline No Apparent Relationship x y

85 Example: Panthers Football Team Scatter Diagram The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. 1 3 2 1 3 14 24 18 17 30 x = Number of Interceptions y = Number of Points Scored Points Scored

86 Scatter Diagram y x Number of Interceptions Number of Points Scored 5 10 15 20 25 30 035 12304

87 Insights Gained from the Preceding Scatter Diagram The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Higher points scored are associated with a higher number of interceptions. The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Example: Panthers Football Team

88 Tabular and Graphical Procedures Qualitative Data Quantitative Data Tabular Methods Tabular Methods Tabular Methods Tabular Methods Graphical Methods Graphical Methods Graphical Methods Graphical Methods Frequency Distribution Rel. Freq. Dist. Percent Freq. Distribution Crosstabulation Bar Graph Pie Chart Frequency Distribution Rel. Freq. Dist. Cum. Freq. Dist. Cum. Rel. Freq. Distribution Stem-and-Leaf Display Crosstabulation Dot Plot Histogram Ogive Scatter Diagram Data


Download ppt "1/54 Statistics Descriptive Statistics— Tables and Graphics."

Similar presentations


Ads by Google