Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Displaying Quantitative Data. Histograms Bins – equal width “piles” that we use to divide up quantitative data The bins and the counts in each.

Similar presentations


Presentation on theme: "Chapter 4: Displaying Quantitative Data. Histograms Bins – equal width “piles” that we use to divide up quantitative data The bins and the counts in each."— Presentation transcript:

1 Chapter 4: Displaying Quantitative Data

2 Histograms Bins – equal width “piles” that we use to divide up quantitative data The bins and the counts in each bin give the distribution of the quantitative variable

3 Enron Corporation Problem Month to Month Stock Price Change Enron Corporation was world’s biggest energy supply corporations. dominating the energy trading business. 1985 Enron stock sold for about $5 a share 2000, Enron stock closed at a 52-week high of $89.75 Less than a year later it hit a low of $0.25 were there hints of trouble that might have been seen?

4 Monthly Price Change JanFebMarAprMayJuneJulyAugSeptOctNovDec 1997-1.44-.75-.69-.88.12.75.81-1.75.69-.22-.16.34 1998.78.622.44-.282.22-.502.06-.88-4.504.121.16-.50 19993.283.34-1.22.475.62-1.594.311.47-.72-.38-3.25.03 20005.72 21.06 4.504.56-1.25-1.19-3.128.009.311.12-3.19 -17.75 2001 14.38 -1.08 -10.11-12.11 5.84-9.37-4.74-2.69 -10.61 -5.85 -17.16 -11.59

5 Histogram of Enron Data

6 Relative Frequency Histogram Replaces the counts on the vertical axis with the percentages of the total number of cases falling in each bin.

7 Stem-and-Leaf Plot 8 8 8 000044 7 6666 7 2222 6 8888 6 0444 5 6 Pulse Rate (8|8 means 88 beats/min) Contain all the information found in a histogram When drawn carefully, it satisfies the area principle and shows distribution Preserve the individual data values When turned, it looks roughly like the actual histogram of the data

8 How Many Bins? 8 0000448 7 22226666 6 04448888 5 6 Pulse Rate (8|8 means 88 beats/min) 8 8 8 8 44 8 8 0000 7 7 6666 7 7 2222 Pulse Rate (8|8 means 88 beats/min) Too few? Too many? It’s a judgment call. Use enough to be meaningful, but not too much that the data is too spaced out.

9 Dotplots Simple display Places a dot along an axis for each case in the data Can be plotted horizontally with the counts on the vertical axis or vertically with the counts on the horizontal axis See graph on page 49

10 Quantitative Data Condition The data values of a quantitative variable whose units are known In order to create a stem-and-leaf plot, histogram, or dotplot, this condition must be met

11 S.O.C.S. How we describe a distribution Shape Outliers Center Spread

12 Shape Describe any modes Describe any symmetry Describe any tails

13 Shape How many “humps?” – Humps are called modes

14 Multimodal

15 Uniform

16 Symmetry

17 Tails The right tail is longer, so the data is skewed to the right. The (usually) thinner ends of a distribution are called the tails

18 The left tail is longer, so the data is skewed to the left.

19 Symmetric, Skewed Right, or Skewed Left? Neither tail is longer and the data appears to symmetric.

20 Gaps Help us see multiple modes May help us to notice when the data may have come from different sources or contain more than one group

21 Outliers? Any data that appears to not “belong” with the rest of the distribution Always refer to outliers with vague terms NEVER just “throw away” an outlier – it can be extremely important in context!! Look for gaps in the data – usually where you find the outliers

22 But How do I Know it’s an Outlier? Shape, gaps, and even outliers are judgment calls at this point. There are generally accepted “tests” for outliers that we statisticians have derived (we’ll see them shortly) and some graphs have clear skew, but there is some room for interpretation. Trust your eyes and what you “see” in the data!

23 Center It could be the “mean” or the “median” Easy description of a “typical” value and a concise summary of the whole batch of numbers When a histogram is unimodal and symmetric, it’s easy to eyeball and give a rough estimate of the center. Not so clear for other histograms (skewed, multimodal, etc). In fact, for multimodal, the center may be meaningless because it could be showing different sets of data.

24

25 Spread Variation matters – Are the data values tightly clustered around the center? – Is the data widely spread out?

26

27 Just Checking It’s often a good idea to think about what the distribution of the data set might look like before we collect the data. What do you think the distribution of the following data sets will look like? Be sure to think in terms of SOCS!

28 Grades of those that study and the grades of those that did not study. Just Checking

29

30 Monthly temperatures in Durham, NC.

31

32 A collection of 1000 peoples, of various ages, body masses collected.

33

34 “Think, Show, Tell” Example Let’s go back to the Kentucky Derby example from Chapter 2. We’re going to focus on the distribution of duration of race times. Think: What do we want to find out? Identify the variables and report the W’s We want to see the distribution of race times of the Kentucky Derby. We have the data from races between 1875 and 2004. Be sure to check the appropriate condition!

35 Kentucky Derby Revisited Show: We almost always want to make a histogram with computer software/graphing calculator when the data is quantitative. *Ask yourself – is the histogram close to what we expected?

36 Tell: Describe the distribution using SOCS. The main body of the distribution is bimodal and fairly symmetric with most of the data clustered between 115 and 140 seconds. There appears to be an upper outlier, indicating that one race was ran much slower than the others. It appears as though this data set describes two different data sets – one when the race was 1.5 miles and another when the race was shortened to 1.25 miles. Because of this, the center of the entire data set would probably be of little importance to us. It appears as though the spread of the data is small within each mode. This may suggest that pace did not drastically change over the 100+ years the race was tracked. Kentucky Derby Revisited

37 Comparing Infant Death Rates In 2001 the infant death rate in the U.S. was 6.8 deaths per 1000 live births. How does the rate differ from region to region? The Kaiser Family Foundation collected data from all 50 states and the District of Columbia, allowing us to compare the infant death rates in the Northeast and Midwest to those in the South and West.

38 Think: The W’s and how, but now let’s put the information into sentences: We want to compare infant death rates for regions of the United States. We have the 2001 rates for each state and the District of Columbia. Comparing Infant Death Rates

39 Show: The rates are quantitative, so a stem-and-leaf display is appropriate. Comparing Infant Death Rates Infant Death Rates (by state), 2001 South and West Northeast and Midwest 567 10 48 9 1973516 8 80 3623 7 27754144 79224 6 85118 88994497 5 85063 8 4 3 8 (3|8| means 3.8 deaths per 1000 live births)

40 Comparing Infant Death Rates Tell: In general, infant death rates appear to have been somewhat higher for states in the South and West than in the Northeast and Midwest. The distribution is roughly symmetric, but may be slightly skewed to the right for the South and West. Nationally, most states had rates above 9. Infant death rates were more consistent in the Northeast and Midwest; no states were above 9, but one state had an unusually low 3.8 infant deaths per 1000 live births.

41 When Order Matters Timeplots – Do we want to see the data in a specific order? – Are we looking for patterns over time? – Time plots use the x-axis for time (year, month, day, hour, etc.) and the y-axis to plot the data points. – Often connected because time is continuous

42 Back to Enron What does the timeplot show that the histogram can’t? Monthly Change in Stock Price ($)

43 Re-expressing Skewed Data One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function. Often, we will transform data using logarithms to create a more symmetric distribution – Don’t worry! We won’t be doing it by hand

44 What Can Go Wrong? Don’t make a histogram of a categorical variable Don’t look for shape, center, and spread of a bar chart Don’t use bars in every display – save them for histograms and bar charts Choose a bin width appropriate to the data Avoid inconsistent scales Label clearly


Download ppt "Chapter 4: Displaying Quantitative Data. Histograms Bins – equal width “piles” that we use to divide up quantitative data The bins and the counts in each."

Similar presentations


Ads by Google