Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 2 Exploring Data: Comparisons and Relationships Topic 7 Comparing Distributions II: Categorical Variables (page 137)

Similar presentations


Presentation on theme: "Unit 2 Exploring Data: Comparisons and Relationships Topic 7 Comparing Distributions II: Categorical Variables (page 137)"— Presentation transcript:

1 Unit 2 Exploring Data: Comparisons and Relationships Topic 7 Comparing Distributions II: Categorical Variables (page 137)

2 OVERVIEW In the previous topic you encountered the notion of a statistical tendency and studied techniques for comparing distributions of quantitative variables. In this topic you will study some basic techniques for comparing distributions of categorical variables.

3 PK Note: This topic is on categorical variables, therefore no lists need to be downloaded. Remember that a categorical variable is one that records simply that category into which a person or thing falls on the characteristic in question.

4 OVERVIEW These techniques involve the analysis of two-way tables of counts. You will use no more complicated mathematical operations than addition and calculation of proportions, but you will acquire some very powerful analytical tools.

5 Do the Preliminaries (page 138 & 139) Definitions

6 #10 - Political Views Liberal – open to new behavior or opinions and willing to discard traditional values. Moderate – not radical or excessively right or left wing. Conservative – holding to traditional attitudes and values and cautious about change or innovation.

7 Essential Question How is a two-way table used to summarize information from a pair of categorical variables? How can you compare and contrast categorical variables from two or more groups?

8 Activity 7-1: Suitability for Politics (pages 139 & 140) (a)political inclination :_____________________________ suitability opinion :_____________________________ categorical categorical - binary

9 (a)political inclination:____________________ suitability opinion:____________________ Often we are interested in considering one variable, the ____________________________ variable as being affected or predicted by the other variable, the _______________________________ variable. categorical categorical - binary response (dependent or y) explanatory (independent or x)

10 (b)explanatory variable:______________________________ response variable:______________________________ (c) This table is called a two-way table since it classifies each person according to two variables. The explanatory variable should be in columns and the response variable should be in rows. political inclination suitability opinion liberalmoderateconservative agree with statement disagree with statement

11 Essential Questions How can a segmented bar graph be used to represent information in two-way tables? How can you compare and contrast categorical variables from two or more groups?

12 Activity 7-2: Age and Political Interest (pages 140 to 144) A two-way table classifies each case (or observational unit) according to 2 variables.

13 a)_________ of the survey respondents were between ages 18 and 35 years old. b)_________ of the survey respondents were between ages 36 and 55 years old. c)_________ of the survey respondents were over 55 years old. 30% 385 ÷ 1265 = 0.3043… 42% 531 ÷ 1265 = 0.41976… 28% 349 ÷ 1265 = 0.275889…

14 Marginal distributions are calculated for one variable. We used bar graphs to represent marginal distributions graphically. To study possible relationships between two categorical variables, one examines conditional distributions. Conditional distributions are distributions of one variable for given categories of the other variable.

15 d)_________ of the young respondents classify themselves as not much interest in politics. e)_________ of the young respondents classify themselves as somewhat interested in politics. f)_________ of the young respondents classify themselves as very much interest in politics. g) 38% 146 ÷ 385 = 0.3792207… 50% 192 ÷ 385 = 0.498701… 12% 47 ÷ 385 = 0.1220779… 18 -3536 - 5556 - 94 not much0.2750.255 somewhat0.4900.441 very much0.2350.304 total1.000 0.379 0.499 0.122 1.000

16 (h) 100% - 90% -- very much 80% - 70% -- somewhat 60% - 50% -- not much 40% - 30% - 20% - 10% - 0% -------------------------------------------------------------------------- 18 - 3536 - 5556 - 94 age category Conditional distributions can be represented visually with segmented bar graphs.

17 100% - 90% -- very much 80% - 70% -- somewhat 60% - 50% -- not much 40% - 30% - 20% - 10% - 0% -------------------------------------------------------------------------- 18 - 3636 - 5556 - 94 age category (i) Does the distribution of political interest seem to differ among the 3 age groups? ______ If so, describe the key features … YES The younger age group does not have much interest in politics, but the older group tends to have very much interest in politics.

18 j) ________ of respondents aged 36 - 55 classify themselves as not much interest in politics. k)_________ of those with not much interest in politics are of age 36 - 55. l)_________ of the 1265 respondents identified themselves as being both between ages 36 - 55 and having not much political interest. When dealing with conditional proportions, it is very important to keep straight which category is the one being conditioned on. 27% 146 ÷ 531 = 0.27495… 38% 146 ÷ 381 = 0.38320… 12% 146 ÷ 1265 = 0.115415…

19 Assignment Activity 7-8: Gender-Stereotypical Toy Advertising (page 151) How can you compare and contrast categorical variables from two or more groups?

20 Essential Question What is the concept of relative risk? How can you compare and contrast categorical variables from two or more groups?

21 Activity 7-3: Pregnancy, AZT, & HIV (pages 144 & 145) a)_________ is my estimate for AZT-receiving women with HIV-positive babies. _________ is my estimate for placebo-receiving women with HIV-positive babies. ? ?

22 b)_________ is the actual proportion for AZT-receiving women with HIV-positive babies. _________ is the actual proportion for placebo- receiving women with HIV-positive babies. c)The proportion of HIV-positive babies among placebo- receiving mothers is ________ times greater the proportion of HIV-positive babies among AZT- receiving mothers. 8% 25% 13 ÷ 164 = 0.079268… 40 ÷ 160 = 0.25 0.25 ÷ 0.08 = 3.125 3

23 You have calculated the relative risk of having an HIV-positive baby between the AZT and placebo groups. If the response variable categories are incidence and non-incidence of a disease, then the relative risk is the ratio of the proportions having the disease between the two groups of the explanatory variable. Definition: in ⋅ ci ⋅ dence [in-si-duhns] noun the rate or range of occurrence or influence of something, especially of something unwanted … the high incidence of heart disease in men over 40.

24 d)[comment] The difference between the 2 groups appears to be important. Based on these results it appears that the drug AZT should be used for HIV positive pregnant women. Please note that the placebo does nothing. It is a “ sugar pill ”. It ’ s use is mental!

25 Assignment Activity 7-13: Driver Safety (continued) (page 153) Show car accident PowerPoint! How can you compare and contrast categorical variables from two or more groups?

26 Essential Question What is Simpson’s Paradox? How can you compare and contrast categorical variables from two or more groups?

27 Activity 7-4: Hypothetical Hospital Recovery Rates (pages 146 & 147) (a)______ of Hospital A ’ s patients survived. ______ of Hospital B ’ s patients survived. Hospital ____ saved the higher percentage of its patients. (b)Are you convinced? ______ (Check the totals first.) 80% 90% B Yes

28 (c)Patients in FAIR condition: _________ of Hospital A ’ s patients survived. _________ of Hospital B ’ s patients survived. Hospital ___ saved the greater percentage of its patients. (d) Patients in POOR condition: _________ of Hospital A ’ s patients survived. _________ of Hospital B ’ s patients survived. Hospital ___ saved the greater percentage of its patients. 98% 97% A 53% 30% A

29 This phenomenon is called Simpson’s paradox. This refers to the fact that aggregate (whole) proportions can reverse the direction of the relationship seen in the individual pieces.

30 it has a higher survival rate regardless of one ’ s condition. Hospital A treats more patients in “ poor ” condition. These patients are more likely to die than those in “ fair ” condition. Therefore, Hospital A ’ s overall survival rate is lower than Hospital B ’ s rate despite being higher for each type of patient. A e)[comment] f) I would prefer to go to Hospital _____, because …

31 Assignment Activity 7-20: Graduate Admissions Discrimination (pages 155 & 156) Assignment Activity 7-21: Softball Batting Averages (pages 156 & 157) Quiz on Topic 7: Comparing Distributions II: Categorical Variables Activity 7-6: Suitability for Politics (continued) How can you compare and contrast categorical variables from two or more groups?

32 Essential Questions What is the concept of independence? How can you compare and contrast categorical variables from two or more groups?

33 Activity 7-5: Women Senators (pages 148 & 149) (a) RepublicansDemocratsrow total men 91 women 9 column total 5545100 52 3 39 6

34 Activity 7-5: Women Senators (pages 148 & 149) (a) RepublicansDemocratsrow total men 91 women 9 column total 5545100 52 3 39 6 (b)Republicans have _______ men and _______ women. Democrats have _______ men and _______ women. The ______________ party has the higher proportion of women among its Senators. 95% 5% 87%13% Democratic

35 Republicans have ________ men and ________ women. Democrats have ________ men and ________ women. 95% 5% 87%13% 100% - 90% -- men 80% - 70% -- women 60% - 50% - 40% - 30% - 20% - 10% - 0% -------------------------------------------------------------------------- RepublicansDemocrats party

36 d)There were ________ Republicans than Democrats in the 1999 U.S. Senate. There were ________ total female Senators in the Democratic party than in the Republican party. Consequently, the Democrats had a ____________ proportion of women among their Senators. more higher

37 Two categorical variables are said to be independent if the conditional distributions of one variable are identical for every category of the other variable. e)The variables gender and party ____________ independent among members of the 1999 U.S. Senate because there is a ______________ in the gender breakdown between the two parties. are not difference

38 (f) RepublicansDemocratsrow total men 80 women 20 column total 6040100 48 12 32 8 These numbers achieve the same gender percentages in both groups: Women: ________ and Men: ________ total women: 20 ÷ 100 =.20 and total men: 80 ÷ 100 =.80 ∴ Republican women:.20 x 60 = 12 and Republican men:.80 x 60 = 48 and Democrat women:.20 x 40 = 8 and Democrat men:.80 x 40 = 32 20% 80%

39 Women: ________ and Men: ________ 20%80% 100% - 90% -- men 80% - 70% -- women 60% - 50% - 40% - 30% - 20% - 10% - 0% -------------------------------------------------------------------------- RepublicansDemocrats party Independent Variables

40 WRAP-UP With this topic we have concluded our investigation of distributions of data. This topic has differed from earlier ones in that it has dealt exclusively with categorical variables. The most important technique that this topic has covered has involved interpreting information presented in two-way tables.

41 You have encountered the ideas of marginal distributions and conditional distributions, and you have learned to draw bar graphs and segmented bar graphs to display these distributions. You have explored the notion of relative risk, and you have discovered and explained the phenomenon known as Simpson’s paradox, which raises interesting issues with regard to analyzing two-way tables.

42 Comparing distributions of categorical variables can also be thought of as exploring relationships between those variables. The next unit will be devoted to exploring relationships between quantitative variables. You will find that our approach will again involve starting with graphical displays (scatterplots) and proceeding to numerical summaries (correlation). You will then study a technique for predicting one variable from the value of another (regression).

43 Assignment Activity 7-25: Politics and Ice Cream (pages 158 & 159) Prepare for Quiz on Topic 7: Comparing Distributions II: Categorical Variables Activity 7-18: Lifetime Achievements (continued) Assignment Activity 7-23: Hypothetical Employee Retention Predictions (pages 157 & 158) How can you compare and contrast categorical variables from two or more groups?

44 Your topic is due! Prepare for a Test on Topics 6 & 7 Quiz on Topic 7: Comparing Distributions II: Categorical Variables Activity 7-18: Lifetime Achievements (continued) How can you compare and contrast categorical variables from two or more groups?


Download ppt "Unit 2 Exploring Data: Comparisons and Relationships Topic 7 Comparing Distributions II: Categorical Variables (page 137)"

Similar presentations


Ads by Google