Download presentation
Presentation is loading. Please wait.
1
Graphical Descriptive Techniques
2
Frequency Distribution
Guidelines for Selecting Number of Classes Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes.
3
Frequency Distribution
Guidelines for Selecting Width of Classes Use classes of equal width. Approximate Class Width =
4
Example: Hudson Auto Repair
The manager of Hudson Auto would like to get a better picture of the distribution of costs for engine tune-up parts. A sample of 50 customer invoices has been taken and the costs of parts, rounded to the nearest dollar, are listed below.
5
Example: Hudson Auto Repair
Frequency Distribution If we choose six classes: Approximate Class Width = ( )/6 = 9.5 10 Cost ($) Frequency Total
6
Example: Hudson Auto Repair
Relative Frequency and Percent Frequency Distributions Relative Percent Cost ($) Frequency Frequency Total
7
Example: Hudson Auto Repair
Insights Gained from the Percent Frequency Distribution Only 4% of the parts costs are in the $50-59 class. 30% of the parts costs are under $70. The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79 class. 10% of the parts costs are $100 or more.
8
Graphical Techniques for Interval Data
Example 1: Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. Collect data Prepare a frequency distribution Draw a histogram
9
Example 1: Providing information
Collect data Prepare a frequency distribution How many classes to use? Number of observations Number of classes Less then , 1,000 – 5, 5, , More than 50, Class width = [Range] / [# of classes] [ ] / [8] = (There are 200 data points Largest observation Largest observation Largest observation Largest observation Smallest observation Smallest observation Smallest observation Smallest observation
10
Example 1: Providing information
Draw a Histogram
11
Example 1: Providing information
What information can we extract from this histogram? Relatively, large number of large bills About half of all the bills are small A few bills are in the middle range 80 71+37=108 =32 =60 60 Frequency 40 20 15 30 45 60 75 90 105 120 Bills
12
Relative frequency It is often preferable to show the relative frequency (proportion) of observations falling into each class, rather than the frequency itself. Relative frequencies should be used when the population relative frequencies are studied comparing two or more histograms the number of observations of the samples studied are different Class relative frequency = Class frequency Total number of observations
13
Class width It is generally best to use equal class width, but sometimes unequal class width are called for. Unequal class width is used when the frequency associated with some classes is too low. Then, several classes are combined together to form a wider and “more populated” class. It is possible to form an open ended class at the higher end or lower end of the histogram.
14
Shapes of histograms Symmetry
There are four typical shape characteristics
15
Shapes of histograms Skewness Negatively skewed Positively skewed
16
Modal classes A unimodal histogram
A modal class is the one with the largest number of observations. A unimodal histogram The modal class
17
Modal classes A bimodal histogram A modal class A modal class
18
Bell shaped histograms
Many statistical techniques require that the population be bell shaped. Drawing the histogram helps us to verify the shape of the population in question
19
Interpreting histograms
Example 2: Selecting an investment An investor is considering investing in one out of two investments. The returns on these investments were recorded. From the two histograms, how can the investor interpret the Expected returns The spread of the return (the risk involved with each investment)
20
Example 2 - Histograms Return on investment A Return on investment B
The center for B The center for A 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- Return on investment A Return on investment B Interpretation: The center of the returns of Investment A is slightly lower than that for Investment B
21
Example 2 - Histograms Return on investment A Return on investment B
Sample size =50 Sample size =50 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 17 16 34 26 46 43 Return on investment A Return on investment B Interpretation: The spread of returns for Investment A is less than that for investment B
22
Example 2 - Histograms Return on investment A Return on investment B
18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- Return on investment A Return on investment B Interpretation: Both histograms are slightly positively skewed. There is a possibility of large returns.
23
Providing information
Example 2: Conclusion It seems that investment A is better, because: Its expected return is only slightly below that of investment B The risk from investing in A is smaller. The possibility of having a high rate of return exists for both investment.
24
Interpreting histograms
Example 3: Comparing students’ performance Students’ performance in two statistics classes were compared. The two classes differed in their teaching emphasis Class A – mathematical analysis and development of theory. Class B – applications and computer based analysis. The final mark for each student in each course was recorded. Draw histograms and interpret the results.
25
Interpreting histograms
The mathematical emphasis creates two groups, and a larger spread.
26
STRIP PLOTS A strip chart is the most basic type of plot available. It plots the data in order along a line with each data point represented as a box. In R: > stripchart(data) There is no title nor axes labels. It only shows how the data looks if you were to put it all along one line and mark out a box at each point. If you would prefer to see which points are repeated you can specify that repeated points be stacked: stripchart(data,method=stacked)
27
STRIP CHART
28
Stem and Leaf Display This is a graphical technique most often used in a preliminary analysis. Stem and leaf diagrams use the actual value of the original observations (whereas, the histogram does not).
29
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. 8 5 7
30
Stem-and-Leaf Display
Leaf Units A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. Leaf units may be 100, 10, 1, 0.1, and so on. Where the leaf unit is not shown, it is assumed to equal 1.
31
Example: Leaf Unit = 0.1 If we have data with values such as a stem-and-leaf display of these data will be Leaf Unit =
32
Example: Leaf Unit = 10 If we have data with values such as a stem-and-leaf display of these data will be Leaf Unit =
33
Stem and Leaf Display Split each observation into two parts.
There are several ways of doing that: Observation: Stem Leaf 42 19 Stem Leaf 4 2 A stem and leaf display for Example 1 will use this method next.
34
Stem and Leaf Display A stem and leaf display for Example 1 Stem Leaf The length of each line represents the frequency of the class defined by the stem.
35
} Ogives Ogives are cumulative relative frequency distributions.
Example 1 - continued 120 1.000 105 .930 90 .790 } } 75 .700 60 .650 .540 .605 .355 15 30 45
36
Summarizing Qualitative Data
Frequency Distribution Relative Frequency Percent Frequency Distribution Bar Graph Pie Chart
37
Frequency Distribution
A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.
38
Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 quests are shown below. Below Average Average Above Average Above Average Above Average Above Average Above Average Below Average Below Average Average Poor Poor Above Average Excellent Above Average Average Above Average Average Above Average Average
39
Frequency Distribution
Example: Marada Inn Frequency Distribution
40
Relative Frequency Distribution
The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.
41
Percent Frequency Distribution
The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.
42
Example: Marada Inn Relative Frequency and Percent Frequency Distributions
43
Graphical Techniques for Nominal data
The only allowable calculation on nominal data is to count the frequency of each value of a variable. When the raw data can be naturally categorized in a meaningful manner, we can display frequencies by Bar charts – emphasize frequency of occurrences of the different categories. Pie chart – emphasize the proportion of occurrences of each category.
44
The Pie Chart The pie chart is a circle, subdivided into a number of slices that represent the various categories. The size of each slice is proportional to the percentage corresponding to the category it represents.
45
Pie Charts The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.
46
Example: Marada Inn Pie Chart
47
Example: Marada Inn Insights Gained from the Preceding Pie Chart
One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.
48
The Pie Chart Example 4 The student placement office at a university wanted to determine the general areas of employment of last year school graduates. Data were collected, and the count of the occurrences was recorded for each area. These counts were converted to proportions and the results were presented as a pie chart and a bar chart.
49
The Pie Chart (28.9 /100)(3600) = 1040 Other 11.1% Accounting 28.9%
General management 14.2% Finance 20.6% Marketing 25.3%
50
Pie Charts Advantages display relative proportions of multiple classes of data size of the circle can be made proportional to the total quantity it represents summarize a large data set in visual form be visually simpler than other types of graphs permit a visual check of the reasonableness or accuracy of calculations require minimal additional explanation be easily understood due to widespread use in business and the media Disadvantages do not easily reveal exact values Many pie charts may be needed to show changes over time fail to reveal key assumptions, causes, effects, or patterns be easily manipulated to yield false impressions
51
Bar Graph A bar graph is a graphical device for depicting qualitative data. On the horizontal axis we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the vertical axis. Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category.
52
The Bar Chart Rectangles represent each category.
The height of the rectangle represents the frequency. The base of the rectangle is arbitrary 73 64 52 36 28
53
Example: Marada Inn
54
The Bar Chart Use bar charts also when the order in which nominal data are presented is meaningful. Total number of new products introduced in North America in the years 1989,…,1994 20,000 15,000 10,000 5,000 ‘ ‘ ‘ ‘ ‘ ‘94
55
Describing the Relationship Between Two Variables
We are interested in the relationship between two interval variables. Example 7 A real estate agent wants to study the relationship between house price and house size Twelve houses recently sold are sampled and there size and price recorded Use graphical technique to describe the relationship between size and price. Size Price 315 229 335 261 ……………..
56
Describing the Relationship Between Two Variables
Solution The size (independent variable, X) affects the price (dependent variable, Y) We use Excel to create a scatter diagram Y The greater the house size, the greater the price X
57
Typical Patterns of Scatter Diagrams
Positive linear relationship No relationship Negative linear relationship Negative nonlinear relationship Nonlinear (concave) relationship This is a weak linear relationship. A non linear relationship seems to fit the data better.
58
Graphing the Relationship Between Two Nominal Variables
We create a contingency table. This table lists the frequency for each combination of values of the two variables. We can create a bar chart that represent the frequency of occurrence of each combination of values.
59
Crosstabulation Crosstabulation is a tabular method for summarizing the data for two variables simultaneously. Crosstabulation can be used when: One variable is qualitative and the other is quantitative Both variables are qualitative Both variables are quantitative The left and top margin labels define the classes for the two variables.
60
Contingency table Example 8
To conduct an efficient advertisement campaign the relationship between occupation and newspapers readership is studied. The following table was created
61
Contingency table Solution
If there is no relationship between occupation and newspaper read, the bar charts describing the frequency of readership of newspapers should look similar across occupations.
62
Bar charts for a contingency table
Blue-collar workers prefer the “Star” and the “Sun”. White-collar workers and professionals mostly read the “Post” and the “Globe and Mail”
63
Describing Time-Series Data
Data can be classified according to the time it is collected. Cross-sectional data are all collected at the same time. Time-series data are collected at successive points in time. Time-series data are often depicted on a line chart (a plot of the variable over time).
64
Line Chart Example 9 The total amount of income tax paid by individuals in 1987 through 1999 are listed below. Draw a graph of this data and describe the information produced
65
Line Chart For the first five years – total tax was relatively flat
From 1993 there was a rapid increase in tax revenues. Line charts can be used to describe nominal data time series.
66
Tabular and Graphical Procedures
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.