3 Types of Variables/data IntervalRatioNominalOrdinal
4 Graphically Summarizing Qualitative Data With qualitative data, names identify the different and non-overlapping categories (classes)This data can be summarized using a frequency distributionFrequency distribution: A table that summarizes the number of items in each of several non-overlapping categories/classes.
5 Example 2.1: Describing 2006 Jeep Purchasing Patterns Table 2.1 lists all 251 vehicles sold in 2006 by the greater Cincinnati Jeep dealersTable 2.1 does not reveal much useful informationA frequency distribution is a useful summarySimply count the number of times each model appears in Table 2.1
6 Table 2.1 lists all 251 vehicles sold in 2006 by the greater Cincinnati Jeep dealers
7 The Resulting Frequency Distribution Jeep ModelFrequencyCommander71Grand Cherokee70Liberty80Wrangler30251
8 Relative Frequency and Percent Frequency Relative frequency summarizes the proportion of items in each classFor each class, divide the frequency of the class by the total number of observationsMultiply times 100 to obtain the percent frequency
9 The Resulting Relative Frequency and Percent Frequency Distribution Jeep ModelRelative FrequencyPercent FrequencyCommander0.282928.29%Grand Cherokee0.278927.89%Liberty0.318731.78%Wrangler0.119511.95%1.0000100.00%
10 Bar Charts and Pie Charts Bar chart: A vertical or horizontal rectangle represents the frequency for each categoryHeight can be frequency, relative frequency, or percent frequencyPie chart: A circle divided into slices where the size of each slice represents its relative frequency or percent frequency
12 Excel Pie Chart of the Jeep Sales Data Pie chart is usually for percent Frequency Distribution.
13 Pareto ChartPareto chart: a type of chart that contains both bars and a line graphA bar chart having the different kinds of categories listed on the horizontal scaleBar height represents the frequency of occurrenceBars are arranged in decreasing height from left to rightAugmented by plotting a cumulative percentage point for each bar
14 Excel Frequency Table and Pareto Chart of Labeling Defects
16 Graphically Summarizing Quantitative Data Often need to summarize and describe the shape of the distributionOne way is to group the measurements into classes of a frequency distribution and then displaying the data in the form of a HistogramA Histogram is a graph in which the class (numerical) midpoints or limits are marked on the horizontal axis and the class frequencies on the vertical axis.
17 Frequency Distribution A frequency distribution is a list of data classes with the count of values that belong to each class“Classify and count”The frequency distribution is a tableShow the frequency distribution in a histogrampicture of the frequency distribution (tabulated frequencies) shown as bars and the bars are drawn adjacent to each other.
18 Constructing a Frequency Distribution Steps in making a frequency distribution:Find the number of classesFind the class lengthForm non-overlapping classes of equal widthTally and countGraph the histogram
19 Example 2.2 The Payment Time Case: A Sample of Payment Times (in days after billing) 222916151817121319102114202523242627Table 2.4
20 Determine the number of Classes If the number of classes istoo small lack of detailtoo large some classes will be emptyGroup all of the n data into K number of classesGeneral rule: K is the smallest whole number for which 2K nIn Examples 2.2 n = 65For K = 6, 26 = 64, < nFor K = 7, 27 = 128, > nSo use K = 7 classes
21 Number of Classes In General Size of Data Set21≤n<434≤n<848≤n<16516≤n<32632≤n<64764≤n<1288128≤n<2569256≤n<52810528≤n<1056
22 Determine the Class Length Find the length of each class as the largest measurement minus the smallest divided by the number of classes found earlier (K)For Example 2.2, (29-10)/7 =Because payments measured in days, round to three days
23 Form Non-Overlapping Classes of Equal Width The classes start on the smallest valueThis is the lower limit of the first classThe upper limit of the first class is smallest value + class lengthIn the example, the first class starts at 10 days and goes up to 13 daysThe next class starts at this upper limit and goes up by class lengthAnd so on
24 Seven Non-Overlapping Classes Payment Time Example 10 days and less than 13 daysClass 213 days and less than 16 daysClass 316 days and less than 19 daysClass 419 days and less than 22 daysClass 522 days and less than 25 daysClass 625 days and less than 28 daysClass 728 days and less than 31 days
25 Tally and Count the Number of Measurements in Each Class First 4 Tally MarksAll 65 Tally MarksFrequency10 < 13|||313 < 16|||| |||| ||||1416 < 19|||||| |||| |||| |||| |||2319 < 22I|||| |||| ||1222 < 25||||| |||825 < 28||||428 < 311
26 Histogram Rectangles represent the classes The base represents the class lengthThe height representsthe frequency in a frequency histogram, orthe relative frequency in a relative frequency histogram
27 HistogramsFrequency HistogramRelative Frequency Histogram
28 Example of a frequency distribution histogram for discrete data The frequency distribution of quiz scores in a class. The score, X, is the number of problems that is answered correctly.
29 Example of a frequency distribution histogram for continuous data
31 CONTINUOUS VARIABLES AND REAL LIMITS For a continuous variable, each score actually corresponds to an interval on the scale. The boundaries that separate these intervals are called real limits.The real limit separating two adjacent scores is located exactly halfway between the scores. Each score has two real limits, one at the top of its interval called the upper real limit and one at the bottom of its interval called the lower real limit.
32 Some Common Distribution Shapes Skewed to the right: The right tail of the histogram is longer than the left tailSkewed to the left: The left tail of the histogram is longer than the right tailSymmetrical: The right and left tails of the histogram appear to be mirror images of each other
36 Frequency PolygonsPlot a point above each class midpoint at a height equal to the frequency of the classUseful when comparing two or more distributions
37 Example 2.3: Comparing The Grade Distribution for Two Statistics Exams Table 2.8 (in textbook) gives scores earned by 40 students on first statistics examTable 2.9 gives the scores on the second exam after an attendance policyDue to the way exams are reported, used the classes: 30<40, 40<50, 50<60, 60<70, 70<80, 80<90, and 90<100
39 A Percent Frequency Polygon Comparing the Two Exam Scores
40 Cumulative Distributions Another way to summarize a distribution is to construct a cumulative distributionTo do this, use the same number of classes, class lengths, and class boundaries used for the frequency distributionRather than a count, we record the number of measurements that are less than the upper boundary of that classIn other words, a running total
41 Frequency, Cumulative Frequency, and Cumulative Relative Frequency Distribution ClassFrequencyCumulative FrequencyCumulative Relative FrequencyCumulative Percent Frequency10 < 1333/65=0.04624.62%13 < 16141717/65=0.261526.15%16 < 1923400.615461.54%19 < 2212520.800080.00%22 < 258600.923192.31%25 < 284640.984698.46%28 < 311651.0000100.00%
42 Ogive (Cumulative Frequency distribution) Ogive: A graph of a cumulative distributionPlot a point above each upper class boundary at height of cumulative frequencyConnect points with line segmentsCan also be drawn usingCumulative relative frequenciesCumulative percent frequencies
45 Cumulative Frequency Distribution For Hours Studying
46 Dot PlotsOn a number line, each data value is represented by a dot placed above the corresponding scale valueDot plots are useful for detecting outliersUnusually large or small observations that are well separated from the remaining observations
48 Stem-and-Leaf Display Purpose is to see the overall pattern of the data, by grouping the data into classesthe variation from class to classthe amount of data in each classthe distribution of the data within each classBest for small to moderately sized data distributions
49 A set of 24 quiz scores presented as raw data and organized in a Stem-and-Leaf Display
57 Comparing Two Distributions with back-to-back bar charts Back-to-Back histogram DisplayComparing Two Distributions with back-to-back bar charts
58 Comparing Two Distributions with back-to-back bar charts Back-to-Back histogram DisplayComparing Two Distributions with back-to-back bar charts
59 Scatter PlotsUsed to study relationships between two quantitative variablesPlace one variable on the x-axisPlace a second variable on the y-axisPlace dot on pair coordinates
60 Types of Relationships Linear: A straight line relationship between the two variablesPositive: When one variable goes up, the other variable goes upNegative: When one variable goes up, the other variable goes downNo Linear Relationship: There is no coordinated linear movement between the two variables
61 A Scatter Plot Showing a Positive Linear Relationship
62 A Scatter Plot Showing a Little or No Linear Relationship
63 A Scatter Plot Showing a Negative Linear Relationship
64 Misleading Graphs and Charts: Scale Break Break the vertical scale to exaggerate effectMean Salaries at a Major University,
65 Misleading Graphs and Charts: Horizontal Scale Effects
66 You can use simple mathematical operations (like averages) to create nonsensical “facts” that can drive whatever agenda you’d like. Example: the average wealth of the citizens of a particular town is $100,000, therefore they don’t need any government assistance. (The town consists of 1 stingy millionaire and 9 homeless people.)