Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Two Categorical Variables: Contingency Tables

Similar presentations


Presentation on theme: "Exploring Two Categorical Variables: Contingency Tables"— Presentation transcript:

1 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Conditional Distributions Variables may be restricted to show the distribution for just those cases that satisfy a specified condition. This is called a conditional distribution. Here are the preferences of the respondents from India and the U.K, which allows comparison of these responses. 1

2 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Conditional Distributions We may display the results of a conditional distribution as a pie chart or as a bar graph. The data from the previous table is displayed here as a side-by-side bar chart. 2

3 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Conditional Distributions Variables can be related in many ways, so it is typically easier to ask if they are not related. In a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the variables are independent. 3

4 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Segmented Bar Charts Data can be displayed by dividing up bars rather than circles. The result is a segmented bar chart where a bar is divided proportionally into segments corresponding to the percentage in each group. The data from the conditional distribution pertaining to India and the U.K. are displayed here as segmented bar charts. 4

5 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: Importance of Wealth GFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart. How much larger is the proportion of those who said acquiring wealth was important in India than in the United States? Is that the impression given by the display? How would you improve this display? 5

6 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: Importance of Wealth GFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart. The statistics reveal that India is less than twice as much as the U.S., but the graph suggests India’s percentage is about 6 times a big as the U.S The vertical scale beginning at 40% distorts the visual impression Start the graph at 0%. 6

7 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: Google financials Google Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006. Are these row or column percentages? What percent of Google’s revenue came from their website’s advertising in 2006? 7

8 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: Google financials Google Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006. These are column percentages because the row sums are greater than 100% but columns add to 100%. 60% of Google’s revenue in 2006 came from their website’s advertising. 8

9 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. What % of all MBA students were from North America? What % of the Two-Year MBAs were from North America? What % of the Evening MBAs were from North America? 9

10 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. 62.7% of all MBA students were from North America. 62.8% of the Two-Year MBAs were from North America % of the Evening MBAs were from North America. 10

11 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. What is the marginal distribution of origin? 11

12 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. The marginal distribution of origin is 23.9% from Asia, % Europe, % Latin America, % Middle East, and 62.7% North American. 12

13 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. Obtain the column percentages and show the conditional distributions of origin by MBA Program. Do you think that origin of the MBA student is independent of the MBA programs? 13

14 Exploring Two Categorical Variables: Contingency Tables
QTM1310/ Sharpe Exploring Two Categorical Variables: Contingency Tables Example: MBAs A survey of the entering MBA students at a university in the United States classified the country of origin of the students, as seen in the table. Origin of the MBA student is not independent of the MBA programs because the distributions appear to be different. For example, the % from Latin America among those in Two-Yr programs is nearly % while those in Evening Programs is less than 1%. 14

15 Don’t violate the area principle. Keep it honest.
QTM1310/ Sharpe Don’t violate the area principle. Keep it honest. The pie chart below is confusing because the percentages add up to more than 100% and the 50% piece of pie looks smaller than 50%. 15

16 QTM1310/ Sharpe Keep it honest. The scale of the years change from two-year values to one-year values for the last three values, which makes comparison awkward. 16

17 Be sure to use enough individuals in gathering data.
QTM1310/ Sharpe Don’t confuse percentages – differences in what a percentage represents needs to be clearly identified. Don’t forget to look at the variables separately in contingency tables and through marginal distributions. Be sure to use enough individuals in gathering data. Don’t overstate your case. You can only conclude what your data suggests. Other studies under other circumstances may find different results. Don’t use unfair or inappropriate percentages. 17

18 QTM1310/ Sharpe Simpson’s Paradox The data below suggest that Peter is a more successful sales rep because his overall success is 83% compared to Katrina’s 78%. The data suggest that Katrina is more successful because of her higher percentage of sales of each product. This is known as Simpson’s Paradox and occurs because percentages are inappropriately combined. 18

19 What Have We Learned? Make and interpret a frequency table for a categorical variable. • We can summarize categorical data by counting the number of cases in each category, sometimes expressing the resulting distribution as percentages. Make and interpret a bar chart or pie chart. • We display categorical data using the area principle in either a bar chart or a pie chart. Make and interpret a contingency table. • When we want to see how two categorical variables are related, we put the counts (and/or percentages) in a two-way table called a contingency table.

20 What Have We Learned? Make and interpret bar charts and pie charts of marginal distributions. • We look at the marginal distribution of each variable (found in the margins of the table). We also look at the conditional distribution of a variable within each category of the other variable. • Comparing conditional distributions of one variable across categories of another tells us about the association between variables. If the conditional distributions of one variable are (roughly) the same for every category of the other, the variables are independent. © 2010 Pearson Education


Download ppt "Exploring Two Categorical Variables: Contingency Tables"

Similar presentations


Ads by Google