Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Categorical Data

Similar presentations


Presentation on theme: "Analysis of Categorical Data"— Presentation transcript:

1 Analysis of Categorical Data
Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

2 Outline Types of categorical analysis Steps to analysis

3 Overview univariable analysis
Dependent variable Independent variable Number of groups in independent variable Parametric test Non parametric test Numerical (one) - One sample t Sign test Categorical 2 groups (independent) Independent t Mann Whitney 2 groups (dependent) Paired t Signed rank test > 2 groups (independent) One way ANOVA Kruskal Wallis (2 groups) Chi square test Fisher exact test McNemar test

4 Introduction Categorical data analysis deals with discrete data that can be organized into categories. The data are organized into a contingency table.

5 Types of categorical data analysis
Statistical tests One proportion Chi-square goodness of fit Two proportion Independent sample Pearson chi-square / Fisher exact Dependent sample McNemar test Stratified sampling to control confounder Mantel-Haenszel test

6 Hypothesis testing Step 1: State the hypotheses Step 2:
Set the significance level Step 3: Check the assumptions Step 4: Perform the statistical analysis Step 5: Make interpretation Step 6: Draw conclusion

7 Contingency table Consists of two columns and two rows.
Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk factors Column: dependent variable / outcome

8 Example of contingency table
CHD present CHD absent Total Smoker 138 32 170 Non-smoker 263 105 368 137 401 538

9 Pearson Chi-square To test the association between two categorical variables Independent sample Result of test: Not significant: no association Significant: an association

10 Research Question Does estrogen receptor associated with breast cancer status? Data: Breast cancer.sav

11 Step 1: State the hypothesis
HO: There is no association between estrogen receptor and breast cancer status. HA: There is an association between estrogen receptor and breast cancer status.

12 Step 2: Set the significance level
α = 0.05

13 Step 3: Check the assumption
Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Breast Ca Total Died Alive ER - ve 310 28 338 ER + ve 508 23 531 818 51 869

14 Step 3: Check the assumption
Variable Breast Ca Total Died Alive ER - ve 310 E = 318.2 28 E = 19.8 338 ER + ve 508 E = 499.8 23 E = 31.2 531 818 51 869

15 Step 4: Statistical test
Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.01 – 0.02

16 Step 4: Statistical test
1 5 3 7 2 6 8 10 9

17 Step 5: Interpretation p value = 0.016
< 0.05 – reject HO, accept HA

18 Step 6: Conclusion There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).

19 Fisher’s Exact Test To test the association between two categorical variables Independent sample Sample sizes are small

20 Research Question Does gender associated with coronary heart disease?
Data: CHD data.sav

21 Step 1: State the hypothesis
HO: There is no association between gender and coronary heart disease. HA: There is an association between gender and coronary heart disease.

22 Step 2: Set the significance level
α = 0.05

23 Step 3: Check the assumption
Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Coronary Heart Disease Total Presence Absent Male 15 5 20 Female 10 25 30

24 Step 3: Check the assumption
Variable Coronary Heart Disease Total Presence Absent Male 15 E = 16.7 5 E = 3.3 20 Female 10 E = 8.3 E = 1.7 25 30 2 cells (50%) – expected count < 5

25 Step 4: Statistical test
Calculate the Chi-square value x2 = ∑((O – E)2/ E) = df = (R-1)(C-1) = (2-1)(2-1) = 1 Between 0.1 – 0.05

26 Step 4: Statistical test
1 5 3 7 6 2 8 10 9

27 Step 5: Interpretation p value = 0.140 > 0.05 – accept HO

28 Step 6: Conclusion There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).

29 McNemar Test Categorical data Dependent sample - Matched sample
- Cross over design - Before & after (same subject) To determine whether the row and column marginal frequencies are equal (marginal homogeneity)

30 Hypotheses Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the same HO : PB = PC HA : PB ≠ PC A & D = concordant pair B & C = discordant pair Discordant pair is pair of different outcome

31 Research Question Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer? The sample were breast cancer patients - matched for age (same decade of age) - same clinical condition Data: breast ca.sav

32 Step 1: State the hypothesis
HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer. HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

33 Step 2: Set the significance level
α = 0.05

34 Step 3: Check the assumption
Two variables are dependent Two variables are categorical

35 Step 4: Statistical test
x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8) =6.125 df = (R-1)(C-1) = (2-1)(2-1) = 1 Calculated x2 > tabulated x2 *x2 = (|b-c|-0.5)2/(b + c)

36 Step 4: Statistical test
3 6 2 1 9 7 4 5 8

37 Step 5: Interpretation p value = 0.008
< 0.05 – reject HO, accept HA

38 Step 6: Conclusion There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).

39 Cochran Mantel-Haenszel Test
Test is a method to compare the probability of an event among independent groups in stratified samples. The stratification factor can be study center, gender, race, age groups, obesity status or disease severity. Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables). The data are arranged in a series of associated 2 × 2 contingency tables.

40 Research Question Does the type of treatment associated with response of treatment among migraine patients after controlling for gender? Confounder: gender Active Placebo Female No of patients 27 25 No of better response 16 5 Male 28 26 12 7

41 Step 1: 2x2 contingency table
Better Same Total Reasons of failure Strata 1 Female Active 16 11 27 Placebo 5 20 25 Strata 2 Male 12 28 7 19 26

42 Step 2: Check the assumption
Random sampling Stratified sampling

43 Step 3: State the hypothesis
HO: There is no association between type of treatment and response of treatment among female and male migraine patients. HA: There is an association between type of treatment and response of treatment among female and male migraine patients.

44 Step 4: Statistical test
Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) Compute Mantel-Haenszel statistics x2MH = ∑(ai –ei)2 ∑vi

45 Step 4: Statistical test
Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni e1 = (16 +11)(16+ 5) 52 = e2 = (12 +16)(12+ 7) 54 =

46 Step 4: Statistical test
Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) v1 = ( )(5 + 20)(16 + 5)(11+20) (52)2(52-1) = v2 = ( )(7 + 19)(12 + 7)(16+19) (54)2(54-1) =

47 Step 4: Statistical test
Compute Mantel-Haenszel statistics x2MH = (∑ai –∑ei)2 ∑vi = (( ) - ( ))2 = = 8.31

48 Step 4: Statistical test
Compute odd ratio ORMH = ∑(ai di/ ni) ∑(bi ci/ ni) = (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313

49 Step 4: Statistical test
Data: Migraine.sav 1 3 2 4 6 5

50 Step 5: Interpretation Compute Mantel-Haenszel statistics
x2MH = (∑ai –∑ei)2 ∑vi = (( ) - ( ))2 = = 8.31 Calculated value > tabulated value Reject HO

51 Step 5: Interpretation HO = OR1 = OR2 Association homogenous *Tarone’s - adjusted HO = OR1 = 1 HO = OR2 = 1 Conditionally independent The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios.

52 Step 6: Conclusion There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004). We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.


Download ppt "Analysis of Categorical Data"

Similar presentations


Ads by Google