The Practice of Statistics in the Life Sciences Third Edition

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Three or more categorical variables
Comparitive Graphs.
AP Statistics Section 4.2 Relationships Between Categorical Variables.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
Analysis of Two-Way tables Ch 9
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Aim: How do we analyze data with a two-way table?
Warm-up An investigator wants to study the effectiveness of two surgical procedures to correct near-sightedness: Procedure A uses cuts from a scalpel and.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
DO NOW: Oatmeal and cholesterol Does eating oatmeal reduce cholesterol
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
4.3 Relations in Categorical Data.  Use categorical data to calculate marginal and conditional proportions  Understand Simpson’s Paradox in context.
Chapter 6 EPS Due 11/06/15.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
CHAPTER 6: Two-Way Tables*
Displaying and Describing Categorical Data
AP Statistics Chapter 3 Part 2 Displaying and Describing Categorical Data.
Smart Start In June 2003, Consumer Reports published an article on some sport-utility vehicles they had tested recently. They had reported some basic.
22. Chi-square test for two-way tables
Second factor: education
Statistics 200 Lecture #7 Tuesday, September 13, 2016
CHAPTER 1 Exploring Data
Objectives (PSLS Chapter 22)
Displaying and Describing Categorical Data
Inference about a population proportion.
Displaying and Describing Categorical Data
CHAPTER 1 Exploring Data
AP Statistics Chapter 3 Part 3
Chapter 3: Displaying and Describing Categorical Data
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
Looking at Data - Relationships Data analysis for two-way tables
Chapter 2 Looking at Data— Relationships
The Practice of Statistics in the Life Sciences Fourth Edition
22. Chi-square test for two-way tables
The Practice of Statistics in the Life Sciences Fourth Edition
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Topic 4: Exploring Categorical Data
CHAPTER 6: Two-Way Tables
Gathering and Organizing Data
AP Statistics Chapter 3 Part 2
Second factor: education
Warmup Which part- time jobs employed 10 or more of the students?
Chapter 2 Looking at Data— Relationships
Displaying and Describing Categorical Data
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
1.1: Analyzing Categorical Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Section 4-3 Relations in Categorical Data
Displaying and Describing Categorical Data
Gathering and Organizing Data
Displaying and Describing Categorical Data
Chapter 1: Exploring Data
Displaying and Describing Categorical Data
Relations in Categorical Data
Chapter 4: More on Two-Variable Data
Displaying and Describing Categorical Data
Presentation transcript:

The Practice of Statistics in the Life Sciences Third Edition 5. Two-way tables The Practice of Statistics in the Life Sciences Third Edition © 2014 W.H. Freeman and Company

Objectives (PSLS Chapter 5) Two-way tables Marginal distributions Conditional distributions Simpson’s paradox

Two-way tables Two-way tables summarize data about two categorical variables (or factors) collected on the same set of individuals. Each factor can have any number of levels. If the row factor has “r” levels, and the column factor has “c” levels, we say that the two-way table is an “r by c” table. High school students were asked whether they smoke, and whether their parents smoke: First factor: Parent smoking status Second factor: Student smoking status Two-way tables are thus named because there are two ways to group the data: by row variable, or by column variable. 400 1380 416 1823 188 1168

Marginal distributions We can examine each factor in a two-way table separately by studying the row totals and the column totals. They represent the marginal distributions, expressed in counts or percents. The name “marginal” refers to the fact that the row and column totals are written as if in a margin. Marginal distribution for parental smoking 400 1380 416 1823 188 1168 Marginal distribution for student smoking

Computing marginal percents Marginal percents are marginal counts divided by the table grand total. 400 1380 33.1% 416 1823 41.7% 188 1168 25.2% 18.7% 81.3% 100% 400 1380 416 1823 188 1168

Parental smoking Graphs The marginal distributions can be displayed on separate bar graphs, typically expressed as percents instead of raw counts. Each graph represents only one of the two variables, ignoring the second one. Each marginal distribution can also be shown in a pie chart. Student smoking

Conditional distributions A conditional distribution is the distribution of one factor for each level of the other factor. A conditional percent is computed using the counts within a single row or a single column. The denominator is the corresponding row or column total (rather than the table grand total). In computing this conditional percent, we divide the count of students who smoke within the “both parents smoke” row, and divide by the “both parent smoke” row total. In essence, we are computing the percent who smoke for the subgroup of students in this study who have both parents smoking. 400 1380 416 1823 188 1168 Percent of students who smoke when both parents smoke = 400/1780 = 22.5%

Comparing conditional distributions Comparing conditional distributions helps us describe the “relationship" between the two categorical variables. We can compare the percent of individuals in one level of factor 1 for each level of factor 2. Substantial differences suggest an association between factor 1 and factor 2. 400 1380 416 1823 188 1168 Notice that the percent of students who smoke is highest among those students who have two parents smoking, and lowest among those students whose parents do not smoke. This indicates an association between parental smoking and student smoking. Parental smoking may influence the student’s decision to smoke. Conditional distribution of student smokers for different parental smoking statuses: Percent of students who smoke when both parents smoke = 400/1780 = 22.5% Percent of students who smoke when one parent smokes = 416/2239 = 18.6% Percent of students who smoke when neither parent smokes = 188/1356 = 13.9%

Percent who do not smoke Graphs The conditional distributions can be compared graphically by displaying the percents making up one factor, for each level of the other factor. Conditional distribution of student smoking status for different levels of parental smoking status:   Percent who smoke Percent who do not smoke Row total Both parents smoke 22% 78% 100% One parent smokes 19% 81% Neither parent smokes 14% 86% Notice that here, only the percents of students who smoke are shown in the bar graph. That is because the percents who do not smoke are implied. That is, if 22% of students who have two parents smoking smoke themselves, then it follows that the remaining 78% of these students do not smoke.

Percent who do not smoke Conditional distribution of student smoking status for different levels of parental smoking status:   Percent who smoke Percent who do not smoke Row total Both parents smoke 22% 78% 100% One parent smokes 19% 81% Neither parent smokes 14% 86% The full set of percents can be displayed in a set of pie charts, one for each level of the condition (here parental smoking status).

Conditional distribution of parental smoking status for different levels of student smoking status:   Student smokes Student does not smoke Percent with 2 parents smoking 40% 32% Percent with 1 parent smoking 41% 42% Percent with 0 parent smoking 19% 27% Column total 100% For each two-way table, there are two conditional distributions. Sometimes both are interesting, and sometimes only one conditional distribution truly interests us. In the smoking status study, the second conditional distribution (given student smoking status) is less interesting because we would imagine that parents might influence student smoking decision, but not vice-versa.

A 2013 Gallup survey investigated how phrasing may affect the opinions of American adults regarding physician-assisted suicide. Here are the findings: The value 70% is a marginal value representing the proportion of respondents in favor of physician-assisted suicide. a conditional value representing the proportion of respondents in favor of physician-assisted suicide, given that the question was asked in Form A. The value 70% represents the percent of respondents in favor when the question was phrased in Form A (“End the patient’s life by some painless means”). It is therefore a conditional value.

Simpson’s paradox Lurking variables are always a problem for interpretation, but their impact can be even more drastic when dealing with categorical data. An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's paradox. The table on the right compares the failure rates when removing kidney stones in a sample of patients, using one of two procedures: open surgery and PCNL (a minimally invasive technique). PCNL: percutaneous nephrolithotomy 273 289 77 61 22% 17% Can you think of a possible lurking variable here?

273 289 77 61 22% 17% The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones with a good chance of success, whereas open surgery is likely used for more problematic conditions. In both cases, small stones and large stones, open surgery has a lower failure rate than PCNL. So why do the combined data suggest that PCNL is better? Because PCNL is used mainly when dealing with small stones, when the failure rate is generally low. Open surgery, by contrast, is used most often when dealing with large stones; and large stones have a higher failure rate overall.

Cancer mortality rates are similar in both groups. In New York State (excluding New York City), 1,359 white men and 121 black men died from prostate cancer in 1994. Based on how many white and black men lived there in 1994, the prostate cancer mortality rates were as follows: All ages   White Black Death from Yes 1,359 121 prostate cancer No 4,736,887 418,871 Total 4,738,246 418,992 Rate per 100,0000 28.7 28.9 Cancer mortality rates are similar in both groups. But when the data are broken down by age group we see that Death from prostate cancer Under 65 years of age Age 65 and older   White Black Yes 76 18 1,282 102 No 4,177,823 396,899 559,075 21,973 Total 4,177,899 396,917 560,357 22,075 Rate per 100,0000 1.8 4.5 228.8 462.1 Age is the confounding variable here. In both age groups, black men had a higher rate of prostate cancer death. However, for both white men and black men, the rate of prostate cancer death was much higher for older men than for men under the age of 65. Simpson’s paradox arises from the fact that the percent of white men was higher among the older men (96%) than among the younger men (91%). black men had a much higher rate of prostate cancer death than white men. What is the source of this example of Simpson’s paradox?