# EPI809/Spring 2008 1 Chapter 10 Hypothesis testing: Categorical Data Analysis.

## Presentation on theme: "EPI809/Spring 2008 1 Chapter 10 Hypothesis testing: Categorical Data Analysis."— Presentation transcript:

EPI809/Spring 2008 1 Chapter 10 Hypothesis testing: Categorical Data Analysis

EPI809/Spring 20082 Learning Objectives 1. Comparison of binomial proportion using Z and 2 Test. 2. Explain 2 Test for Independence of 2 variables 3. Explain The Fishers test for independence 4. McNemars tests for correlated data 5. Kappa Statistic 6. Use of SAS Proc FREQ

EPI809/Spring 20083 Data Types

EPI809/Spring 20084 Qualitative Data 1. Qualitative Random Variables Yield Responses That Can Be Put In Categories. Example: Gender (Male, Female) 2. Measurement or Count Reflect # in Category 3. Nominal (no order) or Ordinal Scale (order) 4. Data can be collected as continuous but recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, hypertension )

EPI809/Spring 20085 Hypothesis Tests Qualitative Data

EPI809/Spring 2008 6 Z Test for Differences in Two Proportions

EPI809/Spring 20087 Hypotheses for Two Proportions

EPI809/Spring 20088 Hypotheses for Two Proportions

EPI809/Spring 20089 Hypotheses for Two Proportions

EPI809/Spring 200810 Hypotheses for Two Proportions

EPI809/Spring 200811 Hypotheses for Two Proportions

EPI809/Spring 200812 Hypotheses for Two Proportions

EPI809/Spring 200813 Z Test for Difference in Two Proportions 1.Assumptions Populations Are Independent Populations Are Independent Populations Follow Binomial Distribution Populations Follow Binomial Distribution Normal Approximation Can Be Used for large samples (All Expected Counts 5) Normal Approximation Can Be Used for large samples (All Expected Counts 5) 2. Z-Test Statistic for Two Proportions

EPI809/Spring 200814 Sample Distribution for Difference Between Proportions

EPI809/Spring 200815 Z Test for Two Proportions Thinking Challenge Youre an epidemiologist for the US Department of Health and Human Services. Youre studying the prevalence of disease X in two states (MA and CA). In MA, 74 of 1500 people surveyed were diseased and in CA, 129 of 1500 were diseased. At.05 level, does MA have a lower prevalence rate? Youre an epidemiologist for the US Department of Health and Human Services. Youre studying the prevalence of disease X in two states (MA and CA). In MA, 74 of 1500 people surveyed were diseased and in CA, 129 of 1500 were diseased. At.05 level, does MA have a lower prevalence rate? MA CA

EPI809/Spring 200816 Z Test for Two Proportions Solution*

EPI809/Spring 200817 Test Statistic: Decision:Conclusion: Z Test for Two Proportions Solution* H 0 : H a : = = n MA = n CA = Critical Value(s):

EPI809/Spring 200818 Test Statistic: Decision:Conclusion: Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 = = n MA = n CA = Critical Value(s):

EPI809/Spring 200819 Test Statistic: Decision:Conclusion: Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 =.05 =.05 n MA = 1500 n CA = 1500 Critical Value(s):

EPI809/Spring 200820 Test Statistic: Decision:Conclusion: Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 =.05 =.05 n MA = 1500 n CA = 1500 Critical Value(s):

EPI809/Spring 200821 Z Test for Two Proportions Solution*

EPI809/Spring 200822 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 =.05 =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion:

EPI809/Spring 200823 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 =.05 =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at =.05

EPI809/Spring 200824 Z = -4.00 Z Test for Two Proportions Solution* H 0 : p MA - p CA = 0 H a : p MA - p CA < 0 =.05 =.05 n MA = 1500 n CA = 1500 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at =.05 There is evidence MA is less than CA

EPI809/Spring 2008 25 2 Test of Independence Between 2 Categorical Variables 2 Test of Independence Between 2 Categorical Variables

EPI809/Spring 200826 Hypothesis Tests Qualitative Data

EPI809/Spring 200827 2 Test of Independence 2 Test of Independence 1.Shows If a Relationship Exists Between 2 Qualitative Variables, but does Not Show Causality 2.Assumptions Multinomial Experiment All Expected Counts 5 3.Uses Two-Way Contingency Table

EPI809/Spring 200828 2 Test of Independence Contingency Table 2 Test of Independence Contingency Table 1.Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables 1.Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables

EPI809/Spring 200829 2 Test of Independence Contingency Table 2 Test of Independence Contingency Table 1.Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables Levels of variable 2 Levels of variable 1

EPI809/Spring 200830 2 Test of Independence Hypotheses & Statistic 2 Test of Independence Hypotheses & Statistic 1.Hypotheses H 0 : Variables Are Independent H 0 : Variables Are Independent H a : Variables Are Related (Dependent) H a : Variables Are Related (Dependent)

EPI809/Spring 200831 2 Test of Independence Hypotheses & Statistic 2 Test of Independence Hypotheses & Statistic 1.Hypotheses H 0 : Variables Are Independent H a : Variables Are Related (Dependent) 2.Test Statistic Observed count Expected count

EPI809/Spring 200832 2 Test of Independence Hypotheses & Statistic 2 Test of Independence Hypotheses & Statistic 1.Hypotheses H 0 : Variables Are Independent H a : Variables Are Related (Dependent) 2.Test Statistic Degrees of Freedom: (r - 1)(c - 1) Rows Columns Observed count Expected count

EPI809/Spring 200833 2 Test of Independence Expected Counts 2 Test of Independence Expected Counts 1.Statistical Independence Means Joint Probability Equals Product of Marginal Probabilities 2.Compute Marginal Probabilities & Multiply for Joint Probability 3.Expected Count Is Sample Size Times Joint Probability

EPI809/Spring 200834 Expected Count Example

EPI809/Spring 200835 Expected Count Example 112 160 Marginal probability =

EPI809/Spring 200836 Expected Count Example 112 160 78 160 Marginal probability =

EPI809/Spring 200837 Expected Count Example 112 160 78 160 Marginal probability = Joint probability = 112 160 78 160

EPI809/Spring 200838 Expected Count Example 112 160 78 160 Marginal probability = Joint probability = 112 160 78 160 Expected count = 160· 112 160 78 160 = 54.6

EPI809/Spring 200839 Expected Count Calculation

EPI809/Spring 200840 Expected Count Calculation

EPI809/Spring 200841 Expected Count Calculation 112x82 160 48x78 160 48x82 160 112x78 160

EPI809/Spring 200842 You randomly sample 286 sexually active individuals and collect information on their HIV status and History of STDs. At the.05 level, is there evidence of a relationship? You randomly sample 286 sexually active individuals and collect information on their HIV status and History of STDs. At the.05 level, is there evidence of a relationship? 2 Test of Independence Example on HIV 2 Test of Independence Example on HIV

EPI809/Spring 200843 2 Test of Independence Solution 2 Test of Independence Solution

EPI809/Spring 200844 2 Test of Independence Solution 2 Test of Independence Solution H 0 : H a : = = df = Critical Value(s): Test Statistic: Decision:Conclusion:

EPI809/Spring 200845 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship = = df = Critical Value(s): Test Statistic: Decision:Conclusion:

EPI809/Spring 200846 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship =.05 =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion:

EPI809/Spring 200847 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship =.05 =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion: =.05 =.05

EPI809/Spring 200848 E(n ij ) 5 in all cells 170x132 286 170x154 286 116x132 286 154x116 286 2 Test of Independence Solution 2 Test of Independence Solution

EPI809/Spring 200849 2 Test of Independence Solution 2 Test of Independence Solution

EPI809/Spring 200850 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship =.05 =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion: =.05 =.05 2 = 54.29 2 = 54.29

EPI809/Spring 200851 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship =.05 =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at =.05 =.05 =.05 2 = 54.29 2 = 54.29

EPI809/Spring 200852 2 Test of Independence Solution 2 Test of Independence Solution H 0 : No Relationship H a : Relationship =.05 =.05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision:Conclusion: Reject at =.05 There is evidence of a relationship =.05 =.05 2 = 54.29 2 = 54.29

EPI809/Spring 200853 2 Test of Independence SAS CODES 2 Test of Independence SAS CODES Data dis; input STDs HIV count; cards; 1 1 84 1 2 32 2 1 48 2 2 122 ; run; Proc freq data=dis order=data; weight Count; weight Count; tables STDs*HIV/chisq; tables STDs*HIV/chisq; run;

EPI809/Spring 200854 2 Test of Independence SAS OUTPUT 2 Test of Independence SAS OUTPUT Statistics for Table of STDs by HIV Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 54.1502 <.0001 Likelihood Ratio Chi-Square 1 55.7826 <.0001 Continuity Adj. Chi-Square 1 52.3871 <.0001 Mantel-Haenszel Chi-Square 1 53.9609 <.0001 Phi Coefficient 0.4351 Contingency Coefficient 0.3990 Cramer's V 0.4351

Download ppt "EPI809/Spring 2008 1 Chapter 10 Hypothesis testing: Categorical Data Analysis."

Similar presentations