Presentation is loading. Please wait.

Presentation is loading. Please wait.

A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models.

Similar presentations


Presentation on theme: "A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models."— Presentation transcript:

1 A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models Rebecca Vassallo ESRC Research Methods Festival, July 2012

2 Introduction Influence of the interviewer and area on survey response behaviour Reflects unmeasured factors including the interviewer’s and area’s characteristics Violation of the assumption of independence of observations Standard analytical techniques will underestimate standard errors and can result in incorrect inference (Snijders & Bosker, 1999) Multilevel modelling has become a popular method in analysing area and interviewer effects on nonresponse

3 Introduction Estimation problem relating to the identifiability of area and interviewer variation Interpenetrated sample design considered as the gold standard for separating interviewer effects from area effects Restrictions in field administration capabilities and survey costs only allow for partial interpenetration Multilevel cross-classified specification used in such cases (Von Sanden, 2004) No studies available examining the properties of parameter estimates from such models under different conditions

4 Study Aims Examine the implications of interviewer dispersal patterns within different scenarios on the quality of parameter estimates Percentage relative bias, confidence interval coverage, power of significance tests and correlation of random parameter estimates Different scenarios vary in sample sizes, overall rates of response, and the area and interviewer variance Identify the smallest interviewer pool and the most geographically-restrictive interviewer case allocation required for acceptable levels of bias and power

5 Methodology: Simulation Model
Model: logit( 𝑝 𝑖𝑗𝑠 )= β 0 + u j + v s ; u j ~N(0, σ u 2 ); v s ~N(0, σ v 2 ) STATA Version 12 calling MLwiN Version 2.25 through the ‘runmlwin’ command (Leckie & Charlton, 2011) Markov Chain Monte Carlo (MCMC) estimation method MCMC method produces less biased estimates compared to first- order marginal quasi-likelihood (MQL) and second-order penalised quasi-likelihood (PQL) (Browne, 1998; Browne & Draper, 2006) IRIDIS High Performance Computing Facility cluster at the University of Southampton

6 Methodology: Data Generating Procedure
Overall probability 𝜋 of the outcome for the area and the interviewer with zero random effects determines overall intercept (fixed for all cases) Cluster-specific random effects for each interviewer and area generated separately from N(0, σ 𝑢 2 ) & N(0, σ 𝑣 2 ) 𝑢 𝑗 and 𝑣 𝑠 are generated for every simulation, but maintained constant across different scenarios where the only factor that changes is interviewer case allocations The allocation of workload from different areas to specific interviewers is limited to a finite number of possibilities

7 Methodology: Data Generating Procedure
logit( 𝑝 𝑖𝑗𝑠 ) of each case are computed and converted to probabilities 𝑝 𝑖𝑗𝑠 Values of the dependent variable 𝑌 𝑖𝑗𝑠 - a dichotomous outcome for each case - are generated from a Bernoulli distribution with probability 𝑝 𝑖𝑗𝑠 For each scenario of the experimental design, 1000 simulated datasets are generated using R Version

8 Methodology: Simulation Factors
Simulated scenarios vary in the following factors: -the overall sample size (N) -the number of interviewers and areas ( N ints ; N area𝑠 ) -the interviewer-area classifications [which vary in terms of the number of areas each interviewer works in (maximum 6 areas) and the overlap in the interviewers working in neighbouring areas] -the ICC (variances σ 𝑢 2 & σ 𝑣 2 ) -the overall probability of the outcome variable (π) Medium scenario design (similar to values observed in a real dataset - a realistic starting point): 120 areas (48 cases/area) allocated to 240 interviewers (24 cases/int), totalling 5760 cases, σ 𝑣 2 =0.3, σ 𝑢 2 =0.3, π=0.8

9 Methodology - Quality Assessment Measures
Correlation between area and interviewer parameter estimates. High negative values indicate identifiability problems Percentage relative bias Confidence interval coverage for 95% Wald confidence interval and the 95% MCMC quantiles are compared to nominal 95% Power of Wald test - proportion of simulations in which the null hypothesis is correctly rejected

10 Results - Power of Tests
For medium scenario power ≈1 for all interviewer case allocations For smaller N, more sparse allocations are required to get power >0.85 Lower σ 𝑣 2 (0.2) results in lower power When N ints = N areas more interviewer dispersion is required for acceptable levels of power Higher π (0.9) requires 2 areas/int for power>0.9 Reduced interviewer overlapping for a constant number of areas/int does not improve power

11 Results - Correlation between σ 𝑣 2 & σ 𝑢 2 Estimates
For all scenarios, high negative ρ (>-0.4) are observed when interviewers work in 1 area only No substantial change in ρ with varying total sample sizes Very high negative ρ (up to -0.9) for N ints = N areas scenarios; ρ only reduced to <-0.1 when interviewer is working in 4+ areas (compared to 2+ areas/int for N ints =2* N areas scenarios) Higher ρ with increasing π up till 2 areas/int allocations; thereafter no change in ρ by π Lower ρ with increasing σ 𝑣 2 up till 3 areas/int allocations Lower ρ with less interviewer overlapping for the 2 areas/int cases

12 Results – Percentage Relative Bias
In most scenarios N=5760, the relative percentage bias is around 1-2% once interviewers are allocated to 2+ areas Further interviewer dispersion (3+ areas) & less interviewer overlapping do not yield systematic drops in bias When interviewers are working in 2+ areas, the bias in the σ 𝑣 2 estimate is generally greater than the bias in σ 𝑢 2 estimate [when N ints =2* N areas ] Greater bias observed for smaller sample sizes, with the scenario including 1440 cases with N ints = N areas obtaining bias values between 5-13% for all allocations

13 Results - Confidence Interval Coverage
Close to 95% nominal rate in all scenarios Some cases of under-coverage or over-coverage for scenarios when interviewer works in just one area -87% coverage (N=5760, N ints =2* N areas , 𝜎 𝑣 2 =0.2, 𝜎 𝑢 2 =0.3, π=0.8, one area/int) for 𝜎 𝑣 2 CI -88% coverage (N=2880 or N=1440, N ints =2* N areas , 𝜎 𝑣 2 =0.3, 𝜎 𝑢 2 =0.3, π=0.8, one area/int) for 𝜎 𝑣 2 CI -100% coverage (N=5760 or 2880 or 1440, N ints = N areas , 𝜎 𝑣 2 =0.3, 𝜎 𝑢 2 =0.3, π=0.8, one area/int) for for 𝜎 𝑣 2 and for 𝜎 𝑢 2 CIs No clear evidence that the MCMC quantiles perform better than the Wald asymptotic normal CIs

14 Conclusion Interpenetration not required to distinguish between area and interviewer variation Good quality estimates obtained for large sample sizes (≈6000 cases) if interviewers work in at least two areas Better estimates obtained when the number of interviewers is greater than the number of areas Higher overall probabilities & smaller variances (smaller ICC) require more interviewer dispersion for some survey conditions The extent of interviewer overlapping shown not to be important Results and their implications can be extended to other applications

15 Acknowledgements University of Southampton, School of Social Sciences Teaching Studentship UK Economic and Social Research Council (ESRC), PhD Studentship (ES/ /1) Gabriele B. Durrant & Peter W. F. Smith, PhD Supervisors

16 References Browne, W. J. (1998). Applying MCMC Methods to Multi-level Models. PhD thesis, University of Bath. Browne, W. & Draper, D. (2006). A comparison of Bayesian and likelihood- based methods for fitting multilevel models. Bayesian Analysis, 1, Leckie, G. & Charlton, C. (2011). runmlwin: Stata module for fitting multilevel models in the MLwiN software package. Centre for Multilevel Modelling, University of Bristol. Snijders, T.A.B. & Bosker, R.J. (1999). Multilevel Analysis: an introduction to basic and advanced multilevel modelling. London: Sage. Von Sanden, N. D. (2004). Interviewer effects in household surveys: estimation and design. Unpublished PhD thesis, School of Mathematics and Applied Statistics, University of Wollongong.


Download ppt "A simulation study of the effect of sample size and level of interpenetration on inference from cross-classified multilevel logistic regression models."

Similar presentations


Ads by Google