Download presentation

Presentation is loading. Please wait.

Published byEdgar Hunter Modified about 1 year ago

1

2
Chapter 4 Displaying and Summarizing Quantitative Data CHAPTER OBJECTIVES At the conclusion of this chapter you should be able to: n 1)Construct graphs that appropriately describe quantitative data n 2)Calculate and interpret numerical summaries of quantitative data. n 3)Combine numerical methods with graphical methods to analyze a data set. n 4)Apply graphical methods of summarizing data to choose appropriate numerical summaries. n 5)Apply software and/or calculators to automate graphical and numerical summary procedures.

3
Displaying Quantitative Data Histograms Stem and Leaf Displays

4
Relative Frequency Histogram of Exam Grades Grade Relative frequency 100

5
Frequency Histogram

6
Histograms A histogram shows three general types of information: n It provides visual indication of where the approximate center of the data is. n We can gain an understanding of the degree of spread, or variation, in the data. n We can observe the shape of the distribution.

7
All 200 m Races 20.2 secs or less

8
Histograms Showing Different Centers

9
Histograms Showing Different Centers (football head coach salaries)

10
Histograms - Same Center, Different Spread (football head coach salaries)

11
Excel Example: NFL Salaries

12
Statcrunch Example: NFL Salaries

13
Grades on a statistics exam Data:

14
Frequency Distribution of Grades Class Limits Frequency 40 up to up to up to up to up to up to 100 Total

15
Relative Frequency Distribution of Grades Class Limits Relative Frequency 40 up to up to up to up to up to up to 100 2/30 =.067 6/30 =.200 8/30 =.267 7/30 =.233 5/30 =.167 2/30 =.067

16
Relative Frequency Histogram of Grades Grade Relative frequency 100

17
Based on the histo- gram, about what percent of the values are between 47.5 and 52.5? 1. 50% 2. 5% 3. 17% 4. 30% Countdown 10

18
Stem and leaf displays n Have the following general appearance stemleaf

19
Stem and Leaf Displays n Partition each no. in data into a “stem” and “leaf” n Constructing stem and leaf display 1) deter. stem and leaf partition (5-20 stems) 2) write stems in column with smallest stem at top; include all stems in range of data 3) only 1 digit in leaves; drop digits or round off 4) record leaf for each no. in corresponding stem row; ordering the leaves in each row helps

20
Example: employee ages at a small company ; stem: 10’s digit; leaf: 1’s digit n 18: stem=1; leaf=8; 18 = 1 | 8 stemleaf

21
Suppose a 95 yr. old is hired stemleaf

22
Number of TD passes by NFL teams: season ( stems are 10’s digit) stemleaf

23
Pulse Rates n = 138

24
Advantages/Disadvantages of Stem-and-Leaf Displays n Advantages 1) each measurement displayed 2) ascending order in each stem row 3) relatively simple (data set not too large) n Disadvantages display becomes unwieldy for large data sets

25
Population of 185 US cities with between 100,000 and 500,000 n Multiply stems by 100,000

26
Back-to-back stem-and-leaf displays. TD passes by NFL teams: , multiply stems by

27
Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic. How many pulses are between 67 and 77? Stems are 10’s digits Countdown 10

28
Interpreting Graphical Displays: Shape n A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Symmetric distribution Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations. Skewed distribution A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side.

29
Heights of Students in Recent Stats Class

30
Shape (cont.)Female heart attack patients in New York state Age: left-skewedCost: right-skewed

31
AlaskaFlorida Shape (cont.): Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier.

32
Center: typical value of frozen personal pizza? ~$2.65

33
Spread: fuel efficiency 4, 8 cylinders 4 cylinders: more spread8 cylinders: less spread

34
Other Graphical Methods for Economic Data n Time plots plot observations in time order, with time on the horizontal axis and the vari- able on the vertical axis ** Time series measurements are taken at regular intervals (monthly unemployment, quarterly GDP, weather records, electricity demand, etc.)

35
Heat Maps

36
Unemployment Rate, by Educational Attainment

37
Water Use During Super Bowl

38
Winning Times 100 M Dash

39
Numerical Summaries of Quantitative Data Numerical and More Graphical Methods to Describe Univariate Data

40
2 characteristics of a data set to measure n center measures where the “middle” of the data is located n variability measures how “spread out” the data is

41
The median: a measure of center Given a set of n measurements arranged in order of magnitude, Median=middle valuen odd mean of 2 middle values,n even n Ex. 2, 4, 6, 8, 10; n=5; median=6 n Ex. 2, 4, 6, 8; n=4; median=(4+6)/2=5

42
Student Pulse Rates (n=62) 38, 59, 60, 60, 62, 62, 63, 63, 64, 64, 65, 67, 68, 70, 70, 70, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 75, 75, 75, 75, 76, 77, 77, 77, 77, 78, 78, 79, 79, 80, 80, 80, 84, 84, 85, 85, 87, 90, 90, 91, 92, 93, 94, 94, 95, 96, 96, 96, 98, 98, 103 Median = (75+76)/2 = 75.5

43
Medians are used often n Year 2014 baseball salaries Median $1,450,000 (max=$28,000,000 Zack Greinke; min=$500,000) n Median fan age: MLB 45; NFL 43; NBA 41; NHL 39 n Median existing home sales price: May 2011 $166,500; May 2010 $174,600 n Median household income (2008 dollars) 2009 $50,221; 2008 $52,029

44
The median splits the histogram into 2 halves of equal area

45
Examples n Example: n = n Example n = 7 (ordered): n n Example: n = n Example n =8 (ordered) m = 14.1 m = ( )/2 = 15.8

46
Below are the annual tuition charges at 7 public universities. What is the median tuition? Countdown 10

47
Below are the annual tuition charges at 7 public universities. What is the median tuition? Countdown 10

48
Measures of Spread n The range and interquartile range

49
Ways to measure variability range=largest-smallest OK sometimes; in general, too crude; sensitive to one large or small data value The range measures spread by examining the ends of the data A better way to measure spread is to examine the middle portion of the data

50
m = median = 3.4 Q 1 = first quartile = 2.3 Q 3 = third quartile = 4.2 Quartiles: Measuring spread by examining the middle The first quartile, Q 1, is the value in the sample that has 25% of the data at or below it (Q 1 is the median of the lower half of the sorted data). The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it (Q 3 is the median of the upper half of the sorted data).

51
Quartiles and median divide data into 4 pieces Q1 M Q3 Q1 M Q3 1/4 1/41/4 1/4

52
Quartiles are common measures of spread n n n University of Southern California University of Southern California n Economic Value of College Majors Economic Value of College Majors

53
Rules for Calculating Quartiles Step 1: find the median of all the data (the median divides the data in half) Step 2a: find the median of the lower half; this median is Q 1 ; Step 2b: find the median of the upper half; this median is Q 3. Important: when n is odd include the overall median in both halves; when n is even do not include the overall median in either half.

54
Example n n = 10 n Median n m = (10+12)/2 = 22/2 = 11 n Q 1 : median of lower half Q 1 = 6 n Q 3 : median of upper half Q 3 = 16 11

55
Quartile example: odd no. of data values n HR’s hit by Babe Ruth in each season as a Yankee Ordered values: Median: value in ordered position 8. median = 46 Lower half (including overall median): Upper half (including overall median):

56
Pulse Rates n = 138 Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70 Q 1 : median of lower half (lower half = 69 smallest pulses); Q 1 = pulse in ordered position 35; Q 1 = 63 Q 3 median of upper half (upper half = 69 largest pulses); Q 3 = pulse in position 35 from the high end; Q 3 =78

57
Below are the weights of 31 linemen on the NCSU football team. What is the value of the first quartile Q 1 ? #stemleaf (4) Countdown 10

58
Interquartile range n lower quartile Q 1 n middle quartile: median n upper quartile Q 3 n interquartile range (IQR) IQR = Q 3 – Q 1 measures spread of middle 50% of the data

59
Example: beginning pulse rates n Q 3 = 78; Q 1 = 63 n IQR = 78 – 63 = 15

60
Below are the weights of 31 linemen on the NCSU football team. The first quartile Q 1 is What is the value of the IQR? #stemleaf (4) Countdown 10

61
5-number summary of data n Minimum Q 1 median Q 3 maximum n Pulse data

62
m = median = 3.4 Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 6.1 Smallest = min = 0.6 Five-number summary: min Q 1 m Q 3 max Boxplot: display of 5-number summary BOXPLOT

63
Boxplot: display of 5-number summary n Example: age of 66 “crush” victims at rock concerts number summary:

64
Boxplot construction 1) construct box with ends located at Q1 and Q3; in the box mark the location of median (usually with a line or a “+”) 2) fences are determined by moving a distance 1.5(IQR) from each end of the box; 2a) upper fence is 1.5*IQR above the upper quartile 2b) lower fence is 1.5*IQR below the lower quartile Note: the fences only help with constructing the boxplot; they do not appear in the final boxplot display

65
Box plot construction (cont.) 3) whiskers: draw lines from the ends of the box left and right to the most extreme data values found within the fences; 4) outliers: special symbols represent each data value beyond the fences; 4a) sometimes a different symbol is used for “far outliers” that are more than 3 IQRs from the quartiles

66
Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 7.9 Boxplot: display of 5-number summary BOXPLOT 8 Interquartile range Q 3 – Q 1 = 4.2 − 2.3 = 1.9 Distance to Q − 4.2 = * IQR = 1.5*1.9=2.85. Individual #25 has a value of 7.9 years, which is 3.7 years above the third quartile. This is more than 2.85 = 1.5*IQR above Q 3. Thus, individual #25 is a suspected outlier.

67
ATM Withdrawals by Day, Month, Holidays

68

69
Beg. of class pulses (n=138) n Q 1 = 63, Q 3 = 78 n IQR=78 63=15 n 1.5(IQR)=1.5(15)=22.5 n Q (IQR): 63 – 22.5=40.5 n Q (IQR): =

70
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who gained at least 50 yards. What is the approximate value of Q 3 ? Pass Catching Yards by Receivers Countdown 10

71
Rock concert deaths: histogram and boxplot

72
Automating Boxplot Construction n Excel “out of the box” does not draw boxplots. n Many add-ins are available on the internet that give Excel the capability to draw box plots. n Statcrunch (http://statcrunch.stat.ncsu.edu) draws box plots.

73
Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 7.9 Statcrunch Boxplot

74
Tuition 4-yr Colleges

75
Statcrunch: NFL Salaries by Position

76
College Football Head Coach Salaries by Conference

77
2013 Major League Baseball Salaries by Team

78
End of General Numerical Summaries. Next: Numerical Summaries of Symmetric Data

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google