2 Chapter Outline 2.1 Frequency Distributions and Their Graphs 2.2 More Graphs and Displays2.3 Measures of Central Tendency2.4 Measures of Variation2.5 Measures of PositionLarson/Farber 4th ed.
3 Overview Descriptive Statistics Describes the important characteristics of a set of data.Organize, present, and summarize data:1. Graphically2. NumericallyLarson/Farber 4th ed.3
4 Important Characteristics of Quantitative Data “Shape, Center, and Spread”Center: A representative or average value that indicates where the middle of the data set is located.Variation: A measure of the amount that the values vary among themselves.Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed).
5 Overview 2.1 Frequency Distributions and Their Graphs 2.2 More Graphs and Displays2.3 Measures of Central Tendency2.4 Measures of Variation2.5 Measures of PositionLarson/Farber 4th ed.5
6 Frequency Distributions and Their Graphs Section 2.1Frequency Distributionsand Their GraphsLarson/Farber 4th ed.
7 Frequency Distributions A table that organizes data values into classes or intervals along with number of values that fall in each class (frequency, f ).Ungrouped Frequency Distribution – for data sets with few different values. Each value is in its own class.Grouped Frequency Distribution: for data sets with many different values, which are grouped together in the classes.
8 Grouped and Ungrouped Frequency Distributions Courses TakenFrequency, f1252383217414625932615Age of VotersFrequency, f18-3020231-4250843-5462055-6641367-7815878-9032
9 Ungrouped Frequency Distributions Number of Peas in a Pea Pod Sample Size: 505463721Peas per podFreq, fPeas per podFreq, f123549186127
10 Graphs of Frequency Distributions: Frequency Histograms A bar graph that represents the frequency distribution.The horizontal scale is quantitative and measures the data values.The vertical scale measures the frequencies of the classes.Consecutive bars must touch.data valuesfrequencyLarson/Farber 4th ed.10
11 Frequency Histogram Ex. Peas per Pod Peas per pod Freq, f 1 2 3 5 4 9 186127
12 Relative Frequency Distributions and Relative Frequency Histograms Shows the portion or percentage of the data that falls in a particular class.Relative Frequency HistogramHas the same shape and the same horizontal scale as the corresponding frequency histogram.The vertical scale measures the relative frequencies, not frequencies.12
13 Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.
14 Grouped Frequency Distributions For data sets with many different values.Groups data into 5-20 classes of equal width.Exam ScoresFreq, fExam ScoresFreq, f30-3940-4950-5960-6970-7980-8990-99Exam ScoresFreq, f30-39140-4950-59460-69970-791380-891090-993
15 Grouped Frequency Distribution Terms Lower class limits: are the smallest numbers that can actually belong to different classesUpper class limits: are the largest numbers that can actually belong to different classesClass width: is the difference between two consecutive lower class limits15
16 Labeling Grouped Frequency Distributions Class midpoints: the value halfway between LCL and UCL:Class boundaries: the value halfway between an UCL and the next LCL
17 Constructing a Grouped Frequency Distribution Determine the range of the data:Range = highest data value – lowest data valueMay round up to the next convenient numberDecide on the number of classes.Usually between 5 and 20; otherwise, it may be difficult to detect any patterns.Find the class width:.Round up to the next convenient number.
18 Constructing a Frequency Distribution Find the class limits.Choose the first LCL: use the minimum data entry or something smaller that is convenient.Find the remaining LCLs: add the class width to the lower limit of the preceding class.Find the UCLs: Remember that classes must cover all data values and cannot overlap.Find the frequencies for each class. (You may add a tally column first and make a tally mark for each data value in the class).Larson/Farber 4th ed.
19 “Shape” of Distributions SymmetricData is symmetric if the left half of its histogram is roughly a mirror image of its right half.SkewedData is skewed if it is not symmetric and if it extends more to one side than the other.UniformData is uniform if it is equally distributed (on a histogram, all the bars are the same height or approximately the same height).
20 The Shape of Distributions SymmetricUniformSkewed leftSkewed Right
21 OutliersOutliersUnusual data values as compared to the rest of the set. They may be distinguished by gaps in a histogram.
22 More Graphs and Displays Section 2.2More Graphs and DisplaysLarson/Farber 4th ed.
23 Other GraphsBesides Histograms, there are other methods of graphing quantitative data:Stem and Leaf PlotsDot PlotsTime Series
24 Stem and Leaf PlotsRepresents data by separating each data value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)Larson/Farber 4th ed.24
25 Constructing Stem and Leaf Plots Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems).Arrange all possible stems vertically so there are no missing stems.Write each leaf to the right of its stem, in order.Create a key to recreate the data.Variations of stem plots:Split stemsBack to back stem plots.Larson/Farber 4th ed.
26 Constructing a Stem-and-Leaf Plot Include a key to identify the values of the data.Larson/Farber 4th ed.
27 Dot PlotsDot plotConsists of a graph in which each data value is plotted as a point along a scale of valuesFigure 2-5
28 Time Series (Paired data) Data set is composed of quantitative entries taken at regular intervals over a period of time.e.g., The amount of precipitation measured each day for one month.Use a time series chart to graph.timeQuantitative dataLarson/Farber 4th ed.28
29 Time-Series Graph Figure 2-8 Ex. www.eia.doe.gov/oil_gas/petroleum/ Number of Screens at Drive-In Movies TheatersFigure 2-8Ex.
30 Graphing Qualitative Data Sets Pie ChartA circle is divided into sectors that represent categories.Pareto ChartA vertical bar graph in which the height of each bar represents frequency or relative frequency.CategoriesFrequencyLarson/Farber 4th ed.
31 Constructing a Pie Chart Find the total sample size.Convert the frequencies to relative frequencies (percent).Marital StatusFrequency,f (in millions)Relative frequency (%)Never Married55.3Married127.7Widowed13.9Divorced22.8Total:
32 Constructing Pareto Charts Create a bar for each category, where the height of the bar can represent frequency or relative frequency.The bars are often positioned in order of decreasing height, with the tallest bar positioned at the left.Figure 2-6
33 Measures of Central Tendency Section 2.3Measures of Central TendencyLarson/Farber 4th ed.
34 Measures of Central Tendency Measure of central tendencyA value that represents a typical, or central, entry of a data set.Most common measures of central tendency:MeanMedianModeLarson/Farber 4th ed.
35 Measure of Central Tendency: Mean Mean : The sum of all the data entries divided by the number of entries.Population mean:Sample mean:Round-off rule for measures of center: Carry one more decimal place than is in the original values. Do not round until the last step.
36 Measure of Central Tendency: Median The value that lies in the middle of the data when the data set is arranged in order from lowest to highest. .Measures the center of an ordered data set by dividing it into two equal parts.A sample mean is often referred to as x.If the data set has anodd number of entries: median is the middle data entry.even number of entries: median is the mean of the two middle data entries.~Larson/Farber 4th ed.
37 Computing the Median If the data set has an: odd number of entries: median is the middle data entry:even number of entries: median is the mean of the two middle data entries:median is the exact middle value:median is the mean of the by two numbers:37
38 Measure of Central Tendency: Mode The data entry that occurs with the greatest frequency.If no entry is repeated the data set has no mode.If two entries occur with the same greatest frequency, each entry is a mode (bimodal).a)b)c)Mode is 1.10Bimodal & 55No Mode
39 Comparing the Mean, Median, and Mode All three measures describe an “average”. Choose the one that best represents a “typical” value in the set.Mean:The most familiar average.A reliable measure because it takes into account every entry of a data set.May be greatly affected by outliers or skew.Median:A common average.Not as effected by skew or outliers.Mode: May be used if there is an overwhelming repeat.
40 Choosing the “Best Average” The shape of your data and the existence of any outliers may help you choose the best average:
41 Section 2.4Measures of VariationLarson/Farber 4th ed.
42 Measures of Variation (“Spread”) Another important characteristic of quantitative data is how much the data varies, or is spread out.The 2 most common method of measuring spread are:RangeStandard deviation and VarianceLarson/Farber 4th ed.42
43 RangeRangeThe difference between the maximum and minimum data entries in the set.The data must be quantitative.Range = (Max. data entry) – (Min. data entry)Larson/Farber 4th ed.
44 Example: Finding the Range The wait time to see a bank teller is studied at 2 banks. Bank A has multiple lines, one for each teller. Bank B has a single wait line for 1st available teller. 5 wait times (in minutes) are sampled from each bank: Bank A: Bank B: Find the mean, median, and range for each bank.
45 Solution: Finding the Range Bank A: Range = ?Bank B: Range = ?Note: The range is easy to compute, but only uses 2 values. Do the following 2 sets vary the same?Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10Larson/Farber 4th ed.
46 Standard Deviation and Variance Measures the typical amount data deviates from the mean.Sample Variance, :Sample Standard Deviation, s:46
47 Finding Sample Variance & Standard Deviation Find the mean of the sample data set.Find deviation of each entry.Square each deviation.Add to get the sum of the deviations squared.Divide by n – 1 to get the sample variance.Find the square root to get the sample standard deviation.47
48 Find the Standard Deviation and Variance for Bank A (multi-line) Wait time, x (in min)Deviation: x – xSquares: (x – x)25.25.2 – 7.3 = -2.1(–2.1)2 = 4.416.26.2 – 7.3 =( )2 =7.57.5 – 7.3 =8.48.4 – 7.3 =9.29.2 – 7.3 =( )2 =Σ(x – x) =Round to one more decimal than the data.Don’t round until the end.Include the appropriate units.
49 Find the Standard Deviation and Variance for Bank B (1 wait line) Wait time, x (in min)Deviation: x – xSquares: (x – x)220.127.116.11.77.9Σ(x – x) =Round to one more decimal than the data.Don’t round until the end.Include the appropriate units.
50 Sample versus Population Standard Deviation and Variance Sample PopulationStatistics: Parameters:Mean x µStandard s σDeviationVariance s σ2
51 Sample versus Population Standard Deviation Note: Unlike x and µ, the formulas for s and σ are not mathematically the same:Sample Standard DeviationPopulation Standard DeviationLarson/Farber 4th ed.
52 Standard Deviation: Key Points ( When would s = 0 ?)The standard deviation is a measure of variation of all values from the mean. The larger s is, the more the data varies.The units of the standard deviation s are the same as the units of the original data values. (The variance has units2).The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others)
53 Interpreting Standard Deviation Standard deviation is a measure of the typical amount an entry deviates from the mean.The more the entries are spread out, the greater the standard deviation.Larson/Farber 4th ed.53
54 Solution: Using Technology to Find the Standard Deviation Sample MeanSample Standard DeviationLarson/Farber 4th ed.
55 Using TechnologyThe gas mileage of 2 cars is sampled over various conditions: Car A: (mpg) Car B: (mpg) Which car do you think gets “better” mpg? Use a calculator to find the mean and standard deviation for each to justify your choice.
56 Standard Deviation and “Spread” How does “s” show how much the data varies?Three methods:1. Range Rule of Thumb2. Chebyshev’s Theorem3. The Empirical Rule
57 The Range Rule of Thumb s Range Rule: For most data sets, the majority of the data lies within 2 standard deviations of the mean.Recall: Range = High – LoEstimate: Range ≈ 4sAlternatively, If the range is known, you can use the range rule to estimate the standard deviation:Range4s
58 Using the Range Rule of Thumb A sample of women’s heights has a mean of 64 inches and a standard deviation of 2.5 inches.Using the range rule, “most” women fall within what heights?What would be an “unusual” height?
59 Using the Range Rule of Thumb The sample of Exam Scores used in the class handout had a mean of Which of the following is most likely the standard deviation of the sample?s = 3.6 s = 12.8 s = 74.5Use the range rule to help justify your choice.
60 Chebyshev’s Theorem Chebyshev’s Theorem For data with any distribution, the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1.For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the meanFor K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean
61 Using Chebyshev’s Theorem A sample of salaries at an elementary school has a mean of $32,000 and a standard deviation of $3000.Use Chebyshev’s Theorem to describe how the salaries are spread out.Would a salary of $28,000 be “unusual?”Would a salary of $45,000 be “unusual”?
62 The Empirical Rule Empirical (68-95-99.7) Rule For data sets having a symmetric distribution:About 68% of all values fall within 1 standard deviation of the meanAbout 95% of all values fall within 2 standard deviations of the meanAbout 99.7% of all values fall within 3 standard deviations of the mean
66 Example: Using the Empirical Rule A sample of IQs has a symmetric distribution with a mean of 100 and a standard deviation of 15.Sketch the distribution.68% of people have an IQ between what 2 values?What percent of people have an IQ between 70 and 130?What percent of people have an IQ between 100 and 115?What percent of people have an IQ above 145?