WLS for Categorical Data
SAS – CATMOD Procedure To fit a model using PROC CATMOD WEIGHT statement – to specify the weight variable Use WLS option at MODEL statement to obtain WLS estimates
Data - Response Whether the investigation of the child also involves further investigation of the siblings REVSIB = 0 (No), 1 (Yes)
Data – Covariates q1a – relationship to children: 1 – Biological parent 2 – Common-law partner 3 – Foster parent 4 – Adoptive parent 5 – Step-parent 6 – Grandparent 7 – Other
Data - Covariates q2a – Gender of the Caregiver: 0 – Female 1 – Male 99 – No response q3a – Age of the Caregiver: 1 – Less than 19 2 – 19 – 21 3 – 22 – 25 4 – 26 – 30 5 – 31 – 40 6 – Over 40 99 – No Response
SAS Code Saturated model: proc catmod; weight wtr; model revsib=q1a|q2a|q3a_age / wls; run; quit;
Output The CATMOD Procedure Data Summary Response revsib Response Levels 2 Weight Variable wtr Populations 28 Data Set T2 Total Frequency 6821.55 Frequency Missing 59.54 Observations 1574
Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------------- Intercept 1 3.70 0.0544 q1a 5 12.89 0.0244 q2a 1 0.18 0.6753 q1a*q2a 4* 18.74 0.0009 q3a_age 5 12.35 0.0303 q1a*q3a_age 7* 28.19 0.0002 q2a*q3a_age 3* 5.17 0.1598 q1a*q2a*q3a_age 2* 13.34 0.0013 Residual 0 . . NOTE: Effects marked with '*' contain one or more redundant or restricted parameters. Q2a – not significant, but has a three-way interaction?
Maximum Likelihood Analysis of Variance Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------------- Intercept 1 1727.82 <.0001 q1a 0* . . q2a 0* . . q1a*q2a 0* . . q3a_age 1* . . q1a*q3a_age 7* . . q2a*q3a_age 1* . . q1a*q2a*q3a_age 6* . . Likelihood Ratio 12 0.00 1.0000 NOTE: Effects marked with '*' contain one or more redundant or restricted parameters. Without WEIGHT statement and WLS option – cannot interpret
Analysis of Maximum Likelihood Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------------------------- Intercept -6.8146 0.1639 1727.82 <.0001 q1a 1 3.3370# . . . 3 19.7614# . . . 4 -29.8195# . . . 5 2.8181# . . . 6 -5.2236# . . . q2a 0 -4.8953# . . . q1a*q2a 1 0 5.2304# . . . 3 0 -19.0829# . . . 4 0 12.8882# . . . 5 0 -3.3065# . . . 6 0 5.6687# . . . q3a_age 1 12.6303# . . . 2 -0.0398 500.1 0.00 0.9999 3 -3.9163# . . . 4 -15.1158# . . . 5 3.0629# . . . Cannot interpret the Estimates
Reduced Model Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------- Intercept 1 6.51 0.0107 q1a 5 15.88 0.0072 q3a_age 5 155.85 <.0001 q1a*q3a_age 7* 13.06 0.0707 Residual 0 . . Try model without Q2A – perhaps there’s no interaction between relationship of children and age group of the caregiver
Main Effect Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------- Intercept 1 15.76 <.0001 q1a 5 52.18 <.0001 q3a_age 5 366.53 <.0001 Residual 7 13.06 0.0707 Try model with Main Effect only
Analysis of Weighted Least Squares Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------ Intercept -1.6354 0.4119 15.76 <.0001 q1a 1 -0.1394 0.3190 0.19 0.6622 3 -0.3338 0.8170 0.17 0.6828 4 3.8902 1.2238 10.11 0.0015 5 -2.8567 0.6279 20.70 <.0001 6 -1.3913 0.3849 13.07 0.0003 q3a_age 1 0.1185 1.2875 0.01 0.9267 2 -1.5960 0.3706 18.55 <.0001 3 1.5098 0.2785 29.40 <.0001 4 -0.8969 0.2780 10.41 0.0013 5 0.0673 0.2673 0.06 0.8013 Interpret the estimates: negative estimates those ones are less likely to have investigation done on the siblings
Conclusion For cases where the Caregiver is “Adoptive parent”, it is “highly likely” that the siblings will also be investigated For Caregiver between age 22-25, those cases will also likely to have the siblings investigated Intercept when not much information is observed regarding the caregiver, chances are the siblings will not be reviewed in the case.
Questions WLS is more efficient than ML? Should the records with “no response” be deleted? Is “99” the best code to indicate “no response”? How would the model change if we have less category in each covariates?
Thank you