Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Similar presentations


Presentation on theme: "Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data."— Presentation transcript:

1 Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data

2 Copyright © 2012 Pearson Education. 4-2 4.1 Summarizing a Categorical Variable The Three Rules of Data Analysis Make a picture. Make a picture. Make a picture. Pictures … reveal things that can’t be seen in a table of numbers. show important features and patterns in the data. provide an excellent means for reporting findings to others.

3 Copyright © 2012 Pearson Education. 4-3 4.1 Summarizing a Categorical Variable A frequency table organizes data by recording totals and category names as in the table below. The names of the categories label each row in the frequency table. When the number of categories gets too large (around 12), values may be lumped together in an “Other” category.

4 Copyright © 2012 Pearson Education. 4-4 4.1 Summarizing a Categorical Variable A relative frequency table displays the percentages that lie in each category rather than the counts. (See the table below.) The percentage of the data in each category is found by dividing the count in each category by the total number of cases and multiplying by 100.

5 Copyright © 2012 Pearson Education. 4-5 4.2 Displaying a Categorical Variable The Area Principle The figure given distorts the data from the frequency tables seen in previous slides (which was data based on internet sandal sales). Though the length of the sandals do match the data, our eyes tend to be impressed by the area which can be misleading.

6 Copyright © 2012 Pearson Education. 4-6 4.2 Displaying a Categorical Variable The Area Principle The best data displays observe the area principle: the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

7 Copyright © 2012 Pearson Education. 4-7 4.2 Displaying a Categorical Variable Bar Charts A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. The bar graph here gives a more accurate visual impression of the sandal data.

8 Copyright © 2012 Pearson Education. 4-8 4.2 Displaying a Categorical Variable Bar Charts If the counts are replaced with percentages, the data can be displayed in a relative frequency bar chart. The relative frequency bar chart looks the same as the bar chart, but shows the proportion of visits in each Category rather than counts.

9 Copyright © 2012 Pearson Education. 4-9 4.2 Displaying a Categorical Variable Pie Charts Pie charts show the whole group of cases as a circle sliced into pieces with sizes proportional to the fraction of the whole in each category. The KEEN Inc. data is displayed below.

10 Copyright © 2012 Pearson Education. 4-10 4.2 Displaying a Categorical Variable Before making a bar chart or pie chart, … the data must satisfy the Categorical Data Condition: the data are counts or percentages of individuals in categories. be sure the categories don’t overlap. consider what you are attempting to communicate about the data.

11 Copyright © 2012 Pearson Education. 4-11 4.2 Displaying a Categorical Variable Example: U.S. Market Share An article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the following pie chart. Is this an appropriate display for these data? Explain. Which company had the largest share of the market?

12 Copyright © 2012 Pearson Education. 4-12 4.2 Displaying a Categorical Variable Example: U.S. Market Share An article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the following pie chart. Yes this is an appropriate display for these data because the categories divide the whole. Coca-Cola had the largest share of the market.

13 Copyright © 2012 Pearson Education. 4-13 4.2 Displaying a Categorical Variable Example: U.S. Market Share An article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the bar chart below. Compare to the previously shown pie chart. Which is better for displaying the relative portions of market share? What is missing from this display that might make it misleading?

14 Copyright © 2012 Pearson Education. 4-14 4.2 Displaying a Categorical Variable Example: U.S. Market Share An article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the bar chart below. Compare to the previously shown pie chart. The pie chart is better for displaying the relative portions of market share. The category “Other” is missing from this display which might make the graph misleading.

15 Copyright © 2012 Pearson Education. 4-15 4.3 Exploring Two Categorical Variables: Contingency Tables Example: Data was collected on the strength of consumers’ preferences for regional foods in their country. The data is displayed in the frequency table and clarified with a pie chart.

16 Copyright © 2012 Pearson Education. 4-16 4.3 Exploring Two Categorical Variables: Contingency Tables To show how opinions on regional foods varied by countries, we can display the data in a contingency table where we have added the countries as a new variable.

17 Copyright © 2012 Pearson Education. 4-17 4.3 Exploring Two Categorical Variables: Contingency Tables The marginal distribution of a variable in a contingency table is the total count that occurs when the value of that variable is held constant. Here the marginal distribution indicated shows that there were 1502 respondents from China.

18 Copyright © 2012 Pearson Education. 4-18 4.3 Exploring Two Categorical Variables: Contingency Tables Each cell of a contingency table (any intersection of a row and column of the table) gives the count for a combination of values of the two variables. Here the indicated cell shows that 4 respondents from India didn’t know how they felt about the question asked.

19 Copyright © 2012 Pearson Education. 4-19 4.3 Exploring Two Categorical Variables: Contingency Tables Rather than displaying the data as counts, a table may display the data as a percentage – as a total percent, row percent, or column percent, which show percentages with respect to the total count, row count, or column count, respectively. We see that 6.74% of all respondents were from China and agreed completely with the question asked.

20 Copyright © 2012 Pearson Education. 4-20 4.3 Exploring Two Categorical Variables: Contingency Tables Conditional Distributions Variables may be restricted to show the distribution for just those cases that satisfy a specified condition. This is called a conditional distribution. Here are the preferences of the respondents from India and the U.K, which allows comparison of these responses.

21 Copyright © 2012 Pearson Education. 4-21 4.3 Exploring Two Categorical Variables: Contingency Tables Conditional Distributions We may display the results of a conditional distribution as a pie chart or as a bar graph. The data from the previous table is displayed here as a side-by-side bar chart.

22 Copyright © 2012 Pearson Education. 4-22 4.3 Exploring Two Categorical Variables: Contingency Tables Conditional Distributions Variables can be related in many ways, so it is typically easier to ask if they are not related. In a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the variables are independent.

23 Copyright © 2012 Pearson Education. 4-23 4.3 Exploring Two Categorical Variables: Contingency Tables Segmented Bar Charts Data can be displayed by dividing up bars rather than circles. The result is a segmented bar chart where a bar is divided proportionally into segments corresponding to the percentage in each group. The data from the conditional distribution pertaining to India and the U.K. are displayed here as segmented bar charts.

24 Copyright © 2012 Pearson Education. 4-24 4.3 Exploring Two Categorical Variables: Contingency Tables Example: Importance of Wealth GFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart. How much larger is the proportion of those who said acquiring wealth was important in India than in the United States? Is that the impression given by the display? How would you improve this display?

25 Copyright © 2012 Pearson Education. 4-25 4.3 Exploring Two Categorical Variables: Contingency Tables Example: Importance of Wealth GFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart. The statistics reveal that India is less than twice as much as the U.S., but the graph suggests India’s percentage is about 6 times a big as the U.S. The vertical scale beginning at 40% distorts the visual impression. Start the graph at 0%.

26 Copyright © 2012 Pearson Education. 4-26 4.3 Exploring Two Categorical Variables: Contingency Tables Example: Google financials Google Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006. Are these row or column percentages? What percent of Google’s revenue came from their website’s advertising in 2006?

27 Copyright © 2012 Pearson Education. 4-27 4.3 Exploring Two Categorical Variables: Contingency Tables Example: Google financials Google Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006. These are column percentages because the row sums are greater than 100% but columns add to 100%. 60% of Google’s revenue in 2006 came from their website’s advertising.

28 Copyright © 2012 Pearson Education. 4-28 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. What % of all MBA students were from North America? What % of the Two-Year MBAs were from North America? What % of the Evening MBAs were from North America?

29 Copyright © 2012 Pearson Education. 4-29 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. 62.7% of all MBA students were from North America. 62.8% of the Two-Year MBAs were from North America. 62.5% of the Evening MBAs were from North America.

30 Copyright © 2012 Pearson Education. 4-30 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. What is the marginal distribution of origin?

31 Copyright © 2012 Pearson Education. 4-31 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. The marginal distribution of origin is 23.9% from Asia, 1.9% Europe, 7.8% Latin America, 3.7% Middle East, and 62.7% North American.

32 Copyright © 2012 Pearson Education. 4-32 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. Obtain the column percentages and show the conditional distributions of origin by MBA Program. Do you think that origin of the MBA student is independent of the MBA programs?

33 Copyright © 2012 Pearson Education. 4-33 4.3 Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. Origin of the MBA student is not independent of the MBA programs because the distributions appear to be different. For example, the % from Latin America among those in Two-Yr programs is nearly 20% while those in Evening Programs is less than 1%.

34 Copyright © 2012 Pearson Education. 4-34 Don’t violate the area principle. Keep it honest. The pie chart below is confusing because the percentages add up to more than 100% and the 50% piece of pie looks smaller than 50%.

35 Copyright © 2012 Pearson Education. 4-35 Keep it honest. The scale of the years change from two-year values to one- year values for the last three values, which makes comparison awkward.

36 Copyright © 2012 Pearson Education. 4-36 Don’t confuse percentages – differences in what a percentage represents needs to be clearly identified. Don’t forget to look at the variables separately in contingency tables and through marginal distributions. Be sure to use enough individuals in gathering data. Don’t overstate your case. You can only conclude what your data suggests. Other studies under other circumstances may find different results. Don’t use unfair or inappropriate percentages.

37 Copyright © 2012 Pearson Education. 4-37 Simpson’s Paradox The data below suggest that Peter is a more successful sales rep because his overall success is 83% compared to Katrina’s 78%. The data suggest that Katrina is more successful because of her higher percentage of sales of each product. This is known as Simpson’s Paradox and occurs because percentages are inappropriately combined.

38 Copyright © 2012 Pearson Education. 4-38 Make and interpret a frequency table for a categorical variable. We can summarize categorical data by counting the number of cases in each category, sometimes expressing the resulting distribution as percentages. Make and interpret a bar chart or pie chart. We display categorical data using the area principle in either a bar chart or a pie chart. Make and interpret a contingency table. When we want to see how two categorical variables are related, we put the counts (and/or percentages) in a two-way table called a contingency table.

39 Copyright © 2012 Pearson Education. 4-39 Make and interpret bar charts and pie charts of marginal distributions. We look at the marginal distribution of each variable (found in the margins of the table). We also look at the conditional distribution of a variable within each category of the other variable. Comparing conditional distributions of one variable across categories of another tells us about the association between variables. If the conditional distributions of one variable are (roughly) the same for every category of the other, the variables are independent. © 2010 Pearson Education


Download ppt "Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data."

Similar presentations


Ads by Google