Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fu Jen Catholic University

Similar presentations


Presentation on theme: "Fu Jen Catholic University"— Presentation transcript:

1 Fu Jen Catholic University
Slides Prepared by Juei-Chao Chen Fu Jen Catholic University

2 Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part B
Exploratory Data Analysis Crosstabulations and Scatter Diagrams x y

3 2.3 Exploratory Data Analysis
The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly. One such technique is the stem-and-leaf display.

4 Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf.

5 Stem-and-Leaf Display
Example: Number of Questions Answered Correctly on An Aptitude Test

6 Stem-and-Leaf Display
Stem: The numbers to the left of the vertical line (6, 7, 8, 9, 10, 11, 12, 13, and 14). Leaf: each digit to the right of the vertical line.

7 Stem-and-Leaf Display
Although the stem-and-leaf display may appear to offer the same information as a histogram, it has two primary advantages. 1. The stem-and-leaf display is easier to construct by hand. 2. Within a class interval, the stem-and-leaf display provides more information than the histogram because the stem-and-leaf shows the actual data.

8 Example: Hudson Auto Repair
The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.

9 Example: Hudson Auto Repair
Sample of Parts Cost for 50 Tune-ups

10 Stem-and-Leaf Display
5 6 7 8 9 10 2 7 a stem a leaf

11 Stretched Stem-and-Leaf Display
If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of

12 Stretched Stem-and-Leaf Display
5 6 7 8 9 10 2 7 1 3 1 4

13 Stem-and-Leaf Display
Leaf Units A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. Leaf units may be 100, 10, 1, 0.1, and so on. Where the leaf unit is not shown, it is assumed to equal 1.

14 a stem-and-leaf display of these data will be
Example: Leaf Unit = 0.1 If we have data with values such as a stem-and-leaf display of these data will be Leaf Unit = 0.1 8 9 10 11 6 8 1 4 2 0 7

15 a stem-and-leaf display of these data will be
Example: Leaf Unit = 10 If we have data with values such as a stem-and-leaf display of these data will be Leaf Unit = 10 16 17 18 19 8 The 82 in 1682 is rounded down to 80 and is represented as an 8. 1 9 0 3 1 7

16 2.4 Crosstabulations and Scatter Diagrams
Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two (or more) variables simultaneously.

17 Crosstabulation A crosstabulation is a tabular summary of data for
two variables. Crosstabulation can be used when: one variable is qualitative and the other is quantitative, both variables are qualitative, or both variables are quantitative. The left and top margin labels define the classes for the two variables.

18 Crosstabulation Example: Data from Zagat’s Restaurant Review.
Data on a restaurant’s quality rating and typical meal price are reported. Quality rating is a qualitative variable with rating categories of good, very good, and excellent. Meal price is a quantitative variable that ranges from $10 to $49.

19 Crosstabulation

20 Crosstabulation Example: Crosstabulation of Quality Rating and
Meal Price for 300 Los Angeles Restaurants

21 Colonial Log Split A-Frame
Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable qualitative variable Home Style Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 55 45 Total 100

22 Crosstabulation Insights Gained from Preceding Crosstabulation
The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000. Only three homes in the sample are an A-Frame style and priced at more than $99,000.

23 Crosstabulation Frequency distribution for the price variable
Home Style Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 55 45 Total 100 Frequency distribution for the home style variable

24 Crosstabulation: Row or Column Percentages
Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

25 Crosstabulation: Row Percentages
Price Range Home Style Colonial Log Split A-Frame Total 100 < $99,000 > $99,000 Note: row totals are actually due to rounding. (Colonial and > $99K)/(All >$99K) × 100 = (12/45) × 100

26 Crosstabulation: Column Percentages
Price Range Home Style Colonial Log Split A-Frame < $99,000 > $99,000 Total (Colonial and > $99K)/(All Colonial) × 100 = (12/30) × 100

27 Crosstabulation: Simpson’s Paradox
Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. suggests the overall relationship between the variables.

28 Scatter Diagram and Trendline
A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an approximation of the relationship.

29 Scatter Diagram and Trendline
A Positive Relationship y x

30 Scatter Diagram and Trendline
A Negative Relationship y x

31 Scatter Diagram and Trendline
No Apparent Relationship y x

32 Example: Panthers Football Team
Scatter Diagram The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 1 3 2 14 24 18 17 30

33 Scatter Diagram y 5 10 15 20 25 30 35 Number of Points Scored x 1 2 3 4 Number of Interceptions

34 Example: Panthers Football Team
Insights Gained from the Preceding Scatter Diagram The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Higher points scored are associated with a higher number of interceptions. The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

35 Tabular and Graphical Procedures
Data Qualitative Data Quantitative Data Tabular Methods Graphical Methods Tabular Methods Graphical Methods Bar Graph Pie Chart Frequency Distribution Rel. Freq. Dist. Percent Freq. Crosstabulation Dot Plot Histogram Ogive Scatter Diagram Frequency Distribution Rel. Freq. Dist. Cum. Freq. Dist. Cum. Rel. Freq. Stem-and-Leaf Display Crosstabulation

36 End of Chapter 2, Part B


Download ppt "Fu Jen Catholic University"

Similar presentations


Ads by Google