Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Displaying and Describing Categorical Data Math2200.

Similar presentations


Presentation on theme: "Chapter 3 Displaying and Describing Categorical Data Math2200."— Presentation transcript:

1 Chapter 3 Displaying and Describing Categorical Data Math2200

2 Categorical variable A categorical variable has only finite number of possible values Gender Car size Course grade

3 Titanic WHO People on the Titanic WHAT Survival status, age, sex, ticket class WHY Historical interest WHEN April 14,1912 WHERE North Atlantic HOW A variety of sources and Internet sites SurvivedAgeSexClass DeadAdultMaleThird DeadAdultMaleCrew DeadAdultMaleThird DeadAdultMaleCrew DeadAdultMaleCrew DeadAdultMaleCrew AliveAdultFemaleFirst DeadAdultMaleThird DeadAdultMaleCrew

4 Three rules of data analysis 1. Make a picture A picture can reveal the pattern and relationship hidden in your data 2. Make a picture A picture can show extraordinary data values or unexpected patterns 3. Make a picture Easy to understand

5 Florence Nightingale Founder of modern nursing First female member of British Statistical Society Used a picture to argue forcefully for better hospital conditions for soldiers

6

7 Frequency tables Making piles: count the number of cases corresponding to each category and pile them up People on Titanic: by ticket class ClassCount First325 Second285 Third706 Crew885

8 Relatively frequency table Proportion: divide counts by the total number of cases Percentage: multiply by 100 The frequency table or relative frequency table describe the distribution of a categorical variable ClassPercentage First14.77% Second12.95% Third32.08% Crew40.21%

9 What is your feeling about the proportion of crew members on board?

10 Why is the picture misleading? The length of each ship corresponds to the number of people in each category Our eyes tend to be more impressed by the area than by other aspects of the image. Even though the length of the ship is about 3 times, but the area is about 9 times. And that is misleading.

11 The area principle the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

12 Bar chart Display of counts of a categorical variable with bars

13 Pie Charts

14 When you make a bar chart or pie chart, pay attention to the following Make sure the variable is indeed categorical Your data are counts or percentages of cases in categories Make sure that the categories do not overlap

15 Was there a relationship between the kind of ticket a passenger held and the passenger’s chances of making it into the lifeboat? What table should we make to answer this question?

16 Contingency table A two-way table The table shows how the subjects are distributed along each variable, contingent on the value of the other variable FirstSecondThirdCrewtotal Alive203118178212711 Dead1221675286731490 Total3252857068852201

17 Add relative frequencies FirstSecondThirdCrewtotal Alive Counts203118178212711 % of Row28.55%16.60%25.04%29.82% 100.00 % % of Column62.46%41.40%25.21%23.95%32.30% % of Table9.22%5.36%8.09%9.63%32.30% Dead Counts1221675286731490 % of Row8.19%11.21%35.44%45.17% 100.00 % % of Column37.54%58.60%74.79%76.05%67.70% % of Table5.54%7.59%23.99%30.58%67.70% Total Counts3252857068852201 % of Row14.77%12.95%32.08%40.21% 100.00 % % of Column100.00% % of Table14.77%12.95%32.08%40.21% 100.00 %

18 Percent of what? What percent of the survivors were in second class? 118/711 = 16.60% What percent were second-class passengers who survived? The Who is everyone on board, i.e., 2201 is the denominator 118/2201 What percent of the second-class passengers survived? 118/285

19 A simplified table FirstSecondThirdCrewtotal Alive9.22%5.36%8.09%9.63%32.30% Dead5.54%7.59%23.99%30.58%67.70% Total14.76%12.95%32.08%40.21%100.00%

20 Marginal distribution In the margins of a contingency table, the frequency distribution of one of the variables is called its marginal distribution

21 Conditional distribution 1 FirstSecondThirdCrewtotal Alive 203118178212711 28.55%16.60%25.04%29.82%100.00% Dead 1221675286731490 8.19%11.21%35.44%45.17%100.00%

22 Pie chart for conditional distributions of ticket Class for survivors and non-survivors

23 Conditional distribution 2 FirstSecondThirdCrewtotal Alive Counts203118178212711 % of Column62.46%41.40%25.21%23.95%32.30% Dead Counts1221675286731490 % of Column37.54%58.60%74.79%76.05%67.70% Total Counts3252857068852201 % of Column100.00%

24 Bar chart for conditional distributions of Ticket Class

25 Segmented Bar Chart

26 What can go wrong? Do not violate the area principle Incorrect correct

27 What can go wrong? Keep it honest Pay attention to labels Whether all percentages add up to 1? Do not confuse similar-sounding percentages The percentage of passengers who were both in first class and survived The percentage of the first class passengers who survived The percentage of the survivors who were in first class

28 What can go wrong? Do not forget to look at the variables separately, too. Look at both conditional and marginal distributions Be sure to use enough individuals Do not overstate your case

29 What can go wrong? Be careful with averages of proportion across several different groups Simpson’s Paradox ( Calculation in last column makes no sense) On-time record for two pilots DayNightOverall Moe90/100=90%10/20=50%100/120=83% Jill19/20=95%75/100=75%94/120=78%

30 Summary Chapter 3 Bar charts and pie charts are displays for categorical variables. A contingency table shows how cases are distributed along each variable conditioned on the other variable. Row/ column sums of table percentage of each cell in a contingency table give the marginal distributions. Row/column percentage in a contingency table show the conditional distributions. Contingency tables help to show the relationship of two categorical variables.


Download ppt "Chapter 3 Displaying and Describing Categorical Data Math2200."

Similar presentations


Ads by Google