Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2009 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data.

Similar presentations


Presentation on theme: "Copyright © 2009 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data."— Presentation transcript:

1 Copyright © 2009 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data

2 Copyright © 2009Pearson Education, Inc. Slide 3- 2 Objectives The student will be able to: Appropriately display categorical data using a frequency table, bar chart, or pie chart. Using a contingency table, determine marginal and conditional distributions, noting any anomalies or extraordinary features revealed by the display of a variable.

3 Copyright © 2009Pearson Education, Inc. Slide 3- 3 The Three Rules of Data Analysis The three rules of data analysis won’t be difficult to remember: 1. Make a picture—things may be revealed that are not obvious in the raw data. These will be things to think about. 2. Make a picture—important features of and patterns in the data will show up. You may also see things that you did not expect. 3. Make a picture—the best way to tell others about your data is with a well-chosen picture.

4 Copyright © 2009Pearson Education, Inc. Slide 3- 4 Context Lets consider the people on board the Titanic on April 14, 1912 Who – People on the Titanic What – Survival status, age, sex, ticket class When – April 14, 1912 Where – North Atlantic How – A variety of sources and Internet sites Why – Historical interest

5 Copyright © 2009Pearson Education, Inc. Slide 3- 5 Frequency Tables: Making Piles Rather than looking at each case (each person aboard the Titanic) we can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names.

6 Copyright © 2009Pearson Education, Inc. Slide 3- 6 Frequency Tables: Making Piles (cont.) A relative frequency table is similar, but gives the percentages (instead of counts) for each category.

7 Copyright © 2009Pearson Education, Inc. Slide 3- 7 What’s Wrong With This Picture? You might think that a good way to show the Titanic data is with this display:

8 Copyright © 2009Pearson Education, Inc. Slide 3- 8 The Area Principle The ship display makes it look like most of the people on the Titanic were crew members, with a few passengers along for the ride. When we look at each ship, we see the area taken up by the ship, instead of the length of the ship. The ship display violates the area principle: The area occupied by a part of the graph should correspond to the magnitude of the value it represents.

9 Copyright © 2009Pearson Education, Inc. Slide 3- 9 Bar Charts A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. Thus, a better display for the ship data is:

10 Copyright © 2009Pearson Education, Inc. Slide 3- 10 Bar Charts (cont.) A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages in the ship data:

11 Copyright © 2009Pearson Education, Inc. Practice Create a frequency table for number of siblings in our class data set. Use the categories: 0, 1, 2, 3, 4+. Is our data categorical? Sketch a bar chart and a relative frequency bar chart Slide 3- 11

12 Copyright © 2009Pearson Education, Inc. Slide 3- 12 When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category. Pie Charts

13 Copyright © 2009Pearson Education, Inc. Slide 3- 13 Make sure your data is categorical! Categorical Data Condition: You are displaying counts of cases within a category Stop – before you make a bar chart or pie chart:

14 Copyright © 2009Pearson Education, Inc. Slide 3- 14 Contingency Tables A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. Example: we can examine the class of ticket and whether a person survived the Titanic:

15 Copyright © 2009Pearson Education, Inc. Slide 3- 15 Contingency Tables (cont.) The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable. The marginal distribution of Survival is represented by the last column: The marginal distribution of Class is represented by the last row:

16 Copyright © 2009Pearson Education, Inc. Slide 3- 16 Contingency Tables (cont.) Each cell of the table gives the count for a combination of values of the two values. For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk.

17 Copyright © 2009Pearson Education, Inc. Practice Create a contingency table for our class data on number of siblings, contingent on gender of the student. Slide 3- 17

18 Copyright © 2009Pearson Education, Inc. Slide 3- 18 Conditional Distributions A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The following is the conditional distribution of ticket Class, conditional on having survived:

19 Copyright © 2009Pearson Education, Inc. Slide 3- 19 Conditional Distributions (cont.) The following is the conditional distribution of ticket Class, conditional on having perished: Note: we isolated the rows of the contingency table and divided each value by the total to get the percentages of these conditional distributions

20 Copyright © 2009Pearson Education, Inc. Slide 3- 20 Conditional Distributions (cont.) The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions:

21 Copyright © 2009Pearson Education, Inc. Slide 3- 21 Conditional Distributions (cont.) We see that the distribution of Class for the survivors is different from that of the nonsurvivors. This leads us to believe that Class and Survival are associated, and thus are are not independent of one another. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable.

22 Copyright © 2009Pearson Education, Inc. Slide 3- 22 Segmented Bar Charts A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Here is the segmented bar chart for ticket Class by Survival status:

23 Copyright © 2009Pearson Education, Inc. Practice Create a conditional distribution of number of siblings conditional on being female Create a conditional distribution of number of siblings conditional on being male Create a segmented bar chart of segmented pie chart of each distribution Do you think these distributions are independent or not? What are some limitations to drawing a conclusion? Slide 3- 23

24 Copyright © 2009Pearson Education, Inc. Slide 3- 24 Examples 200 adults shopping at a supermarket were asked about the highest level of education they had completed and whether or not they smoke cigarettes. Results are summarized below. Is there an association between education level and smoking? Think – What is the context? Show - If there is no association between smoking and education level then the conditional distributions should be the same between smokers and non-smokers. Calculate the conditional distribution of education by smoking status Make a graph (e.g. a segmented bar graph) Tell – It seems that smokers tend to have a lower education level than non-smokers. 64% of the smokers had only a high school education, compared to 40% of nonsmokers. And nonsmokers were almost twice a likely as smokers (48% to 26%) to have a completed at least 4 years of college. SmokerNonsmokerTotal High School326193 2-yr college51722 4+ year college137285 Total50150200

25 Copyright © 2009Pearson Education, Inc. Slide 3- 25 Don’t violate the area principle. While some people might like the pie chart on the left better, it is harder to compare fractions of the whole, which a well-done pie chart does. What Can Go Wrong?

26 Copyright © 2009Pearson Education, Inc. Slide 3- 26 What Can Go Wrong? (cont.) Keep it honest—make sure your display shows what it says it shows. This plot of the percentage of high-school students who engage in specified dangerous behaviors has a problem. Can you see it?

27 Copyright © 2009Pearson Education, Inc. Slide 3- 27 What Can Go Wrong? (cont.) Don’t confuse similar-sounding percentages— pay particular attention to the wording of the context. Example: The percentage of the passengers who were in first class and who survived: 203/2201 or 9.4% The percentage of first class passengers who survived: 203/325 or 62.5% Don’t forget to look at the variables separately too—examine the marginal distributions, since it is important to know how many cases are in each category.

28 Copyright © 2009Pearson Education, Inc. Slide 3- 28 What Can Go Wrong? (cont.) Be sure to use enough individuals! Do not make a report like “We found that 66.67% of the rats improved their performance with training. The other rat died.”

29 Copyright © 2009Pearson Education, Inc. Slide 3- 29 What Can Go Wrong? (cont.) Don’t overstate your case—don’t claim something you can’t. Don’t use unfair or silly averages—this could lead to Simpson’s Paradox, so be careful when you average one variable across different levels of a second variable. Table below is on-time flights by time of day and pilot Who is the better pilot? DayNightOverall Moe90 / 100 = 90% 10 / 20 = 50% 100 / 120 = 83% Jill19 / 20 = 95% 75 / 100 = 75% 94 / 120 = 78%

30 Copyright © 2009Pearson Education, Inc. Slide 3- 30 What have we learned? We can summarize categorical data by counting the number of cases in each category (expressing these as counts or percents). We can display the distribution in a bar chart or pie chart. And, we can examine two-way tables called contingency tables, examining marginal and/or conditional distributions of the variables. If conditional distributions of one variable are the same for every category of the other, the variables are independent.

31 Copyright © 2009Pearson Education, Inc. Slide 3- 31 Example (#32 in text) A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table: a) What % of cars surveyed were foreign? b) What % of the American cars were owned by students? c) What percent of the students owned American cars? d) What is the marginal distribution of origin? e) What are the conditional distributions of origin by driver classification? f) Do you think that the origin of the car is independent of the type of driver? Studentstaff American107105 European3312 Asian5547

32 Copyright © 2009Pearson Education, Inc. Slide 3- 32 A quick Intro to Stat Crunch CourseCompass -> Multimedia Library Select StatCrunch and then hit “FindNow” Then select StatCrunch StatCrunch is also located at http://campusweb.howardcc.edu/statcrunch/http://campusweb.howardcc.edu/statcrunch/ passcode:stats However, the chapter data sets are only available using the access through Course Compass. Find the car data set under Chapter 3 (use the navigation bar on the left) Create a bar chart (bar plot) of car origin Create 2 separate bar charts of car origin grouped by driver Create a pie chart of car origin grouped by driver What observations can you make?

33 Copyright © 2009Pearson Education, Inc. Slide 3- 33 Hand-in Homework The following table summarizes the blood pressure of a group of employees according to their age group 1. (0.5 point) How many employees 30 years or older were screened? 2. (1 point) Show the marginal distribution of age. Write your answer as counts AND percentages to two decimal places. 3. (1 point) Show the conditional distribution for the "normal" blood pressure group. Write your answer as counts AND percentages to two decimal places. 4. (0.5 point) What percent of the people 50 or older had high blood pressure? age under 30 30-49 50 and over blood low 2737 31 pressure normal 4891 93 high 2351 73


Download ppt "Copyright © 2009 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data."

Similar presentations


Ads by Google