Summarizing Categorical Data

Slides:



Advertisements
Similar presentations
X y Exploratory data analysis Cross tabulations and scatter diagrams.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Probabilistic and Statistical Techniques Lecture 3 Dr. Nader Okasha.
1 1 Slide © University of Minnesota-Duluth, Summer 2009-Econ-2030(Dr. Tadesse) Chapter 2 Descriptive Statistics.
1 1 Slide © University of Minnesota-Duluth, Summer-2009 Econ-2030(Dr. Tadesse) Chapter-2: Descriptive Statistics: Tabular and Graphical Presentations Part.
1/54 Statistics Descriptive Statistics— Tables and Graphics.
QMS 6351 Statistics and Research Methods Chapter 2 Descriptive Statistics: Tabular and Graphical Methods Prof. Vera Adamchik.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Descriptive Statistics Summarizing qualitative data Summarizing quantitative data.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
1 1 Slide © 2006 Thomson/South-Western Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part A n Summarizing Qualitative Data n Summarizing.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
X y Exploratory data analysis Cross tabulations and scatter diagrams.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Summarizing Quantitative Data Frequency Distribution Relative Frequency and Percent Frequency Distributions Histogram Cumulative Distributions Ogive.
© 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. 1 Slide Slide Slides Prepared by Juei-Chao Chen Fu Jen Catholic University Slides Prepared.
1 1 Slide 統計學 Fall 2003 授課教師:統計系余清祥 日期: 2003 年 9 月 23 日 第二週:敘述性統計量.
Econ 3790: Business and Economics Statistics
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Descriptive Statistics: Tabular and Graphical Methods
1 1 Slide © 2006 Thomson/South-Western Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part B n Exploratory Data Analysis n Crosstabulations.
DATA FROM A SAMPLE OF 25 STUDENTS ABBAB0 00BABB BB0A0 A000AB ABA0BA.
1 1 Slide © 2005 Thomson/South-Western Introduction to Statistics Chapter 2 Descriptive Statistics.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Business Statistics **** Management Information Systems Business Statistics Third level First mid-term: Instructor: Dr. ZRELLI Houyem Majmaah.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc.
Descriptive Statistics: Tabular and Graphical Presentations n Summarizing Qualitative Data n Summarizing Quantitative Data.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
BIA 2610 – Statistical Methods Chapter 2 – Descriptive Statistics: Tabular and Graphical Displays.
1 1 Slide © 2005 Thomson/South-Western Introduction to Statistics Chapter 2 Descriptive Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Chapter 2 – Descriptive Statistics
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Chapter 2, Part A Descriptive Statistics: Tabular and Graphical Presentations n Summarizing Categorical Data n Summarizing Quantitative Data Categorical.
1 1 Slide © 2005 Thomson/South-Western OPIM 303-Lecture #1 Jose M. Cruz Assistant Professor.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
1 1 Slide Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
ISTANBUL STOCK EXCHANGE (BIST) FELL 6 POINTS IN AVERAGE TODAY THE UNITED STATES DOLLAR (USD) APPRECIATED BY 4 PERCENT LAST WEEK AGAINST TURKISH LIRA (TRL).
Fundamentals of Business Statistics chapter2 descriptive statistics: tabular and graphical presentations.
Graphing options for Quantitative Data
Descriptive Statistics: Tabular and Graphical Methods
Chapter 2 Describing Data: Graphical
Descriptive Statistics: Tabular and Graphical Methods
Chapter 2 Descriptive Statistics
Warm Up.
Chapter 2 Descriptive Statistics: Tabular and Graphical Methods
CHAPTER 1: Picturing Distributions with Graphs
THE STAGES FOR STATISTICAL THINKING ARE:
THE STAGES FOR STATISTICAL THINKING ARE:
Topic 7: Visualization Lesson 1 – Creating Charts in Excel
Chapter 1: Exploring Data
Essentials of Statistics for Business and Economics (8e)
OAD30763 Statistics in Business and Economics
Organizing, Displaying and Interpreting Data
Design by : Ms Sheema Aftab
Displaying Distributions with Graphs
Chapter 1: Exploring Data
Frequency Distribution and Graphs
Fu Jen Catholic University
Presentation transcript:

Chapter 2 - Descriptive Statistics: Tabular and Graphical Presentations

Summarizing Categorical Data Frequency Distribution Relative Frequency Distribution Percent Frequency Distribution Bar Chart Pie Chart

Frequency Distribution Categorical data use labels or names to identify categories of like items. Quantitative data are numerical values that indicate how much or how many. Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.

Frequency Distribution Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Above Average Below Average Poor Average Below Average Above Average Average Average Above Average Below Average Poor Excellent

Frequency Distribution Example: Marada Inn Rating Frequency Poor Below Average Average Above Average Excellent 2 3 5 9 1 Total 20

Using Excel’s COUNTIF Function to Construct a Frequency Distribution Excel Formula Worksheet Note: Rows 9-21 are not shown.

Relative Frequency Distribution Percent Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100.

Relative Frequency and Percent Frequency Distributions Example: Marada Inn Relative Frequency Percent Frequency Poor Below Average Average Above Average Excellent Rating .10 .15 .25 .45 .05 Total 1.00 10 15 25 45 5 100 .10(100) = 10 1/20 = .05

Another Example Example Soft drink purchasers were asked to select one among the five popular soft drinks: Coca-cola, Diet coke, Dr.Pepper, Pepsi and Sprite. Soft drink selected by a sample of 20 purchasers are: Coca-Cola Pepsi Dr. Pepper Diet Coke Sprite Rating Frequency Coca-Cola 2 Diet Coke 3 Dr. Pepper 5 Pepsi 9 Sprite 1 Total 20

Bar Chart A bar chart is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis). Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category.

Marada Inn Quality Ratings Bar Chart Marada Inn Quality Ratings 1 2 3 4 5 6 7 8 9 10 Frequency Rating Poor Below Average Average Above Average Excellent

Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency and percent frequency distributions for categorical data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.

Marada Inn Quality Ratings Pie Chart Marada Inn Quality Ratings Excellent 5% Poor 10% Below Average 15% Above Average 45% Average 25%

Summarizing Quantitative Data Relative Frequency and Percent Frequency Distributions Dot Plot Histogram Cumulative Distributions Ogive

Frequency Distribution Example Sanderson and Clifford, a small public accounting firm wants to determine time in days required to complete year end audits. It takes a sample of 20 clients. 12 15 20 22 14 27 21 18 19 33 16 17 23 28 13

Frequency Distribution Example: Sanderson and Clifford If we choose five classes: Approximate Class Width = (33 - 12)/5 = 4.2   4 Time in days Frequency 10-14 4 15-19 8 20-24 5 25-29 2 30-34 1 Total 20

Using Excel’s PivotTable to Construct a Pivot Table Step 1 – Select any cell in the data set Step 2 – Click Insert on the Ribbon Step 3 – In the Tables group click Pivot Table Step 4 - When the Create PivotTable dialog box appears: Click (a PivotTable and PivotTable Fields dialog box will appear in a new worksheet)

Using Excel’s PivotTable to Construct a Pivot Table Step 5 - In the PivotTable Fields dialog box: Drag Audit time to the Rows area Drag Audit time to the Values area Step 6 - Click on Sum of Audit time in the Values area Step 7 - Click Value Field Settings from the list of options Step 8 - When the Value Field Settings dialog box appears: Under Summarize value field by, choose Count Click OK

Using Excel’s Pivot table to construct a frequency distribution

Using Excel’s PivotTable to Construct a Frequency Distribution To construct the frequency distribution, we must group the rows containing audit time. Step 1 - Right click any cell in the PivotTable report containing a an audit time. Step 2 - Choose Group from the list of options that appears Step 3 - When the Grouping dialog box appears: Enter 10 in the Starting at box Enter 34 in the Ending at box Enter 5 in the By box Click OK

Relative Frequency and Percent Frequency Distributions Example: Sanderson and Clifford Audit time (in days) Relative Frequency Percent Frequency 10 – 14 .20 (4/20) 20 (0.2 * 100) 15 – 19 .40 40 20 – 25 .25 25 25 – 29 .10 10 30 – 34 .05 5 Total 1.00 100

Relative Frequency and Percent Frequency Distributions Example: Sanderson and Clifford Insights obtained from the Percent Frequency Distribution: 40% of the audits required from 15 to 19 days. Another 25% of the audits required 20 to 25 days. Only 5% of the audits required more than 30 days.

Dot Plot Example: Hudson Auto Repair One of the simplest graphical summaries of data is a dot plot. Tune-up Parts Cost 50 60 70 80 90 100 110 Cost ($) A horizontal axis shows the range of data values. Then each data value is represented by a dot placed above the axis.

Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

Using Excel’s Recommended Charts Tool to Construct a Histogram Step 1. Select any cell in the PivotTable report Step 2. Click Insert on the Ribbon Step 3. In the Charts group, click Recommended Charts Step 4. Click OK … editing options

Using Excel’s Recommended Charts Tool to Construct a Histogram Step 1. Right-click any bar in the chart and choose Format Data Series from the list of options Step 2. When the Format Data Series dialog box appears: Go to the Series Options section Set the Gap Width to 0 Click the Close button at the top right Step 3. Click the Chart Title and replace it with Tune-up Parts Cost Step 4. Click the Chart Elements button (in top right corner of the chart) … continued

Using Excel’s Recommended Charts Tool to Construct a Histogram Step 5. When the list of chart elements appears: Click Axis Titles Click Legend (to remove check in Legend box) Step 6. Click the Horizontal (Category) Axis Title and replace it with Parts Cost ($) Step 7. Click the Vertical (Value) Axis Title and replace it with Frequency

Histogram Example: Sanderson and Clifford

Histograms Showing Skewness Symmetric Left tail is the mirror image of the right tail Examples: heights and weights of people .05 .10 .15 .20 .25 .30 .35 Relative Frequency

Histograms Showing Skewness Moderately Skewed Left A longer tail to the left Example: exam scores Relative Frequency .05 .10 .15 .20 .25 .30 .35

Histograms Showing Skewness Moderately Right Skewed A Longer tail to the right Example: housing values Relative Frequency .05 .10 .15 .20 .25 .30 .35

Histograms Showing Skewness Highly Skewed Right A very long tail to the right Example: executive salaries Relative Frequency .05 .10 .15 .20 .25 .30 .35

Cumulative Distributions Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class. Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class.

Cumulative Distributions Example: Sanderson and Cliffords Audit time (Days) Cumulative Frequency Cumulative Relative Frequency Cumulative Percent Frequency ≤ 14 4 .20 20 ≤ 19 12 .60 60 ≤ 24 17 .85 85 ≤ 29 19 .95 95 ≤ 34 1.00 100

Stem and Leaf Display A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line (row) in the display is referred to as a stem. Each digit on a stem is a leaf.

Hudson Auto Repair Once Again The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in her shop. She examines 50 customer invoices for tune-ups. Sample of Parts Cost ($) for 50 Tune-ups 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 79 76 104 74 68 105 77 65 109 85 88 83 67 82 98 101 73

Stem-and-Leaf Display Example: Hudson Auto Repair 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf

Stretched Stem-and-Leaf Display If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display vertically by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of 5 - 9.

Stretched Stem-and-Leaf Display Example: Hudson Auto Repair 5 5 9 1 4 7 7 7 8 9 1 3 5 8 9 0 0 2 3 5 5 5 6 7 8 9 9 9 1 1 2 2 3 4 4 5 6 7 8 8 8 9 9 9 2 2 2 2 7 2 5 6 8 9 10

Crosstabulations and Scatter Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously.

Crosstabulation A crosstabulation is a tabular summary of data for two variables. Crosstabulation can be used when: one variable is qualitative and the other is quantitative, both variables are qualitative, or both variables are quantitative. The left and top margin labels define the classes for the two variables. Refer to Crosstabulation Example - Restaurant

Colonial Log Split A-Frame Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable categorical variable Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $200,000 > $200,000 12 14 16 3 Total 30 20 35 15 100

Crosstabulation Example: Finger Lakes Homes Home Style Price Range Frequency distribution for the price range variable Example: Finger Lakes Homes Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $200,000 > $200,000 12 14 16 3 Total 30 20 35 15 100 Frequency distribution for the home style variable

Crosstabulation: Row Percentages Example: Finger Lakes Homes Price Range Home Style Colonial Log Split A-Frame Total 32.73 10.91 34.55 21.82 100 < $200,000 > $200,000 26.67 31.11 35.56 6.67 Note: row totals are actually 100.01 due to rounding. (Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100

Crosstabulation: Column Percentages Example: Finger Lakes Homes Price Range Home Style Colonial Log Split A-Frame 60.00 30.00 54.29 80.00 < $200,000 > $200,000 40.00 70.00 45.71 20.00 Total 100 100 100 100 (Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100 Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. The reversal of conclusions based on aggregate and unaggregated data is called Simpson’s paradox. See Example in Book

Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline provides an approximation of the relationship.

Scatter Diagram

Scatter Diagram The Panthers football team is interested in Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 1 3 2 14 24 18 17 30

Scatter Diagram and Trendline Scatter Diagram for the Panthers 5 10 15 20 25 30 35 1 2 3 4 Number of Interceptions Number of Points Scored. Refer to Scatter Diagram – Stereo We can look at trendline

Example: Panthers Football Team Insights Gained from the Preceding Scatter Diagram The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Higher points scored are associated with a higher number of interceptions. The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

Stacked Bar Chart A stacked bar chart is another way to display and compare two variables on the same display. It is a bar chart in which each bar is broken into rectangular segments of a different color. If percentage frequencies are displayed, all bars will be of the same height (or length), extending to the 100% mark.

Stacked Bar Chart

If multiple colors or lines are used, provide a legend. Data visualization describes the use of graphical displays to summarize and present information about a data set. The goal is to communicate as effectively and clearly as possible the key information about the data Creating effective graphical displays is as much art as it is science. Here are some guidelines . . . Give the display a clear and concise title. Keep the display simple. Clearly label each axis and provide the units of measure. If colors are used, make sure they are distinct. If multiple colors or lines are used, provide a legend.

Choosing the Type of Graphical Display to show the frequency distribution or relative frequency distribution for categorical data Bar Chart to show the relative frequency or percent frequency for categorical data Pie Chart Dot Plot to show the distribution for quantitative data over the entire range of the data to approximate the relationship of data in a scatter diagram Histogram Stem-and-Leaf Display to show both the rank order and shape of the distribution for quantitative data Side-by-Side Bar Chart to compare two variables to compare the relative frequency or percent frequency of two categorical variables Stacked Bar Chart to show the relationship between two quantitative variables Scatter Diagram Trendline to show the frequency distribution for quantitative data over a set of class intervals

Data Dashboards A data dashboard is a widely used data visualization tool. It organizes and presents key performance indicators (KPIs) used to monitor an organization or process. It provides timely, summary information that is easy to read, understand, and interpret. Some additional guidelines include . . . Minimize the need for screen scrolling. Avoid unnecessary use of color or 3D. Use borders between charts to improve readability.

Data Dashboard Example

Tabular and Graphical Methods Data Categorical Data Quantitative Data Tabular Methods Graphical Methods Tabular Methods Graphical Methods Bar Chart Pie Chart Frequency Distribution Rel. Freq. Dist. Percent Freq. Crosstabulation Dot Plot Histogram Ogive Stem-and- Leaf Display Scatter Diagram Frequency Distribution Rel. Freq. Dist. % Freq. Dist. Cum. Freq. Dist. Cum. Rel. Freq. Cum. % Freq. Crosstabulation

http://www.pmslweb.com/the-blog/wp-content/uploads/2012/03/31-probability-of-being-right-graph.png