# Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services.

## Presentation on theme: "Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services."— Presentation transcript:

Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services CDIC/ Tobacco control section

2 Overview 1.Introduction 2.Data Source and Study Population 3.Methods: Model Description and Simulations 4.Results 5.Discussion and Conclusions

3 Introduction Distribution of the random effects Random Effect model: Logit [E (Y ij =1| b i )] = ß 0 + ß 1 X ij1 + ß 2 X ij2 + ß 2 X ij3 +b i

4 Introduction Research Questions: 1.What is the impact of this normality assumption on the estimates of fixed effects? 2.When a cluster- level confounder is omitted from the model and the random effects are associated with the covariates in the model, are the estimates of RE model correct?

5 Data Source and Study Population 1.Regional Rural Injury Study-II(RRIS-II) was a population-based, prospective cohort study, which was designed to investigate the incidence and consequences of agricultural injury in the five state region of Minnesota, Wisconsin, North Dakota, South Dakota, and Nebraska in 1999. 2.3,765 household, including 16,538 persons, participated in the study. 3.We modeled the probability of agricultural activity-related injury (Yes/No). Gender, prior injury, and working hours per week on the agricultural operation were chosen as the covariates. 4.Clustered binary data---the same operations and the potential similarity of behaviors between parents and children.

6 Model Description Generalized linear mixed models (GzLMM) with a random intercept is expressed as: Logit [E (Y ij =1| b i )] = ß 0 + ß 1 X ij1 + ß 2 X ij2 + ß 3 X ij3 +b i Y ij indicates whether the agricultural activity-related injury happened or not in 1999 for the jth person in the ith family b i is the random effect for the ith family X ijk indicates gender, age, education, marital status, prior injury, working hours on the farm and the percentage of prior injury within the family (PPI).

7 Methods Random effect model : The marginal likelihood for Bernoulli data is as follows: Where P ij =E(Y ij | b i ) and with a logit link: b i ~ N(0, σ 2 )

8 Methods Conditional model: Logit [E (Y ij =1| b i )] = ß 0 + ß 1 X ij1 + ß 2 X ij2 + ß 3 X ij3 +b i Given the sufficient statistics for b i, S ij, the conditional likelihood can be expressed as: L=

9 Methods Marginal model only takes the fixed effects into account, the model is: Logit [E (Y ij =1| b i )] = ß * 0 + ß * 1 X ij1 + ß * 2 X ij2 + ß * 3 X ij3 Var(Y ij )=φVar(E(Y ij )) It estimates β* by solving a quasi score function: Marginal model:

10 Simulations---(1) True model for the first simulation: Logit [E (Y ij | b i )] = ß 0 + ß 1 *gender + ß 2 *workhour +ß 3 *priorinj +b i Random effects: nb i =sqrt (σ 2 /v 2 )*(b i -µ) nb i ~ ( 0, σ 2 ) σ 2 the estimated variance for random effects from the true model. v 2 the variance of the predicted bi from the true model. µ the mean of estimated bi from the true model. b i the predicted random effects from the true model. P ij = Exp (X ij ß+nb i )/(1+ Exp(X ij ß+nb i ))

11 Simulations---(1) 1000 different seedsRandom numbers (R) from U(0,1) for each individual If R P ij, SimY=1; else, SimY=0 Pr (simY) = P ij Covariates remain the same with the real data a marginal model, RE model and conditional model was fit for each data set.

12 Simulations---(2) True model for the second simulation: Logit [E (Y ij | b i )] = ß 0 + ß 1 *gender + ß 2 *workhour +ß 3 *priorinj + ß 4 *PPI + bi PPI-Percentage of prior injury within family. Random effects: nb i =sqrt (σ 2 /v 2 )*(b i -µ) nb i ~ ( 0, σ 2 ) σ 2 the estimated variance for random effects from the true model. v 2 the variance of the predicted bi from the true model. µ the mean of estimated bi from the true model. b i the predicted random effects from the true model. P ij = Exp (X ij ß+nb i )/(1+ Exp(X ij ß+nb i ))

13 Simulations---(2) 1000 different seedsRandom numbers (R) from U(0,1) If R P ij, SimY=1; else, SimY=0 P r (simY) = P ij Covariates remain the same with the real data A marginal model, RE model and conditional model was fit for each data set.

14 Results - Results from the model using real data: Marginal model RE model Conditional model Working hoursEstimate95% C.I.Estimate95% C.I.Estimate95% C.I. 0-2.78-3.38-2.18-2.90-3.53-2.26-2.50-3.28-1.73 0-20-1.24-1.55-0.92-1.33-1.66-0.99-1.15-1.65-0.66 21-40-0.63-0.91-0.34-0.69-0.38-0.55-0.09 41-60-0.31-0.58-0.04-0.33-0.63-0.04-0.13-0.580.32 61-80-0.20-0.460.07-0.22-0.510.08-0.05-0.490.40 81+ *0.00

15 Results - Results from the model using real data:

16 Results-- Simulation (1) of the model without PPI: The true model for the simulation is the RE model: Logit [E (Y ij | b i )] = ß 0 + ß 1 *gender + ß 2 *workhour +ß 3 *priorinj +b i 1.The average estimates of the RE model are closer to those of the conditional model. 2.The bias for prior injury in the RE model is 0.1130, much larger than the estimates from the marginal model and conditional model: 0.0573 and -0.0036. 3.The MSE for prior injury from the RE model is 0.0182, which is much bigger than the MSE from the marginal model and conditional models: 0.0073 and 0.0089.

Marginal modelRE modelConditional model Variables Working hoursGender Prior injury Working hoursGender Prior injury Working hoursGender Prior injury B t (True values) 0.94480.57161.12631.03920.62871.23881.03920.62871.2388 Bs (Avg estimates) 0.90480.51721.18361.06130.57651.35181.05160.62491.2352 Bias (B s -B t )-0.0400-0.05440.05730.0221-0.05220.11300.0124-0.0038-0.0036 MSE mean((B i -B t )^2)) 0.00730.00810.00730.00830.00920.01820.01240.00830.0089

Marginal modelRE modelConditional model WorkingGender Prior injuryWorkingGender Prior injuryWorkingGender Prior injury Percentage of 80% C.I. coverage0.77700.68100.69500.80800.72300.48300.79900.80000.8620 Percentage of 85% C.I. coverage0.82200.75500.74500.86900.79100.53400.85100.84400.9130 Percentage of 90% C.I. coverage0.87800.82700.79400.92100.83200.62300.92600.88800.9360 Percentage of 95% C.I. coverage0.94000.89400.88800.96400.90100.74200.97100.94400.9720 Percentage of 99% C.I. coverage0.98900.98300.97200.98500.99000.90300.99900.99400.9950

19 Results -Simulation of the Model with PPI: Hypotheses for incorrect estimates for prior injury in the RE model: an important cluster level confounder related to prior injury was omitted from the model. The random effects were significantly associated with prior injury (ß =0.0162, p=0.0021). After PPI was included in the model, the random effects were independent of prior injury (ß=0.0027, p=0.6101) and PPI (ß=0.0084, p=0.3668).

20 Results -Simulation of the Model with PPI: PPI is a confounder for the effect of prior injury because it is significant in the model (ß=0.5205, p<0.001) and also associated with prior injury. True model for the second simulation: Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj + ß4 *PPI + bi

Marginal model RE modelConditional model Variables WorkingGender Prior injuryWorkingGender Prior injuryWorkingGender Prior injury B t (True values) 0.96880.59400.98271.06530.65311.08051.06530.65311.0805 Bs (Avg estimates) 0.91450.53751.09351.06980.62851.25171.04290.62491.2389 Bias (B s -B t ) -0.0543-0.05650.11090.0045-0.02460.1712-0.0224-0.02820.1584 MSE mean((B i -B t )^2)) 0.00570.00490.00670.00800.00720.00880.01410.00770.0097

22 Results -Simulation of the Model with PPI: The average estimates of the RE model are almost equal to those of the conditional model. The biases of the estimates for these three models are very similar to each other. The mean squared errors (MSE) of the estimates of these three models are also similar.

Marginal modelRE modelConditional model Variables Working hoursGender Prior injury Working hoursGender Prior injury Working hoursGender Prior injury Percentage of 80% C.I. coverage0.83400.82800.85400.83000.76700.83500.8100 0.8030 Percentage of 85% C.I. coverage0.89500.86400.87700.88100.83400.88100.84500.87000.8740 Percentage of 90% C.I. coverage0.92600.91700.90400.92000.88900.90600.89700.90000.9270 Percentage of 95% C.I. coverage0.95700.97500.95100.95800.93800.94800.94100.94500.9630 Percentage of 99% C.I. coverage0.99701.00000.99900.9970 0.99201.0000.9900

24 Results -Simulation of the Model with PPI: The percentage of C.I. coverage is higher than the corresponding confidence interval. For instance, the percentage of 80% C.I. coverage for working hours in the marginal model is 83.4%, higher than 80%. The percentages of C.I. coverage are greatly improved, especially for the prior injury in the RE model. For instance, only 48.3% of the 80% confidence intervals of the prior injury estimates cover the true value, but in Table 7, 83.5% of the 80% confidence intervals cover the true value.

25 Discussion and Conclusions The fixed effects from the RE model are correct even in cases where the distribution of random effects does not follow the normal distribution. The random effects should be independent of the covariates in the model.

26 Discussion and Conclusions In this project, after we include PPI, the random effects were independent of all the covariates, the random effects still did not follow a normal distribution --- unknown or unmeasured variables may exist which affects the probability of injury within the family.

27 Discussion and Conclusions One limitation of the project is that the true values for marginal model were not available. For further study, the impacts of the normality assumption of random effects on the estimates of random effects are of interest.

Thank You!

Download ppt "Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services."

Similar presentations